Optimizing 'airoboros-l2-?b-gpt4-2.0' for Limited Resources: Seeking Guidance
Hey everyone,
I'm facing a challenging issue and could really use your help. Here's the situation:
My Setup:
CPU: AMD Ryzen R5 3600
RAM: 8GB (with a 30GB swap file)
GPU: Nvidia RTX 3060 Ti
OS: Ubuntu 22.04 (Linux Lite)
Nvidia drivers: Version 470 (CUDA 11.4)
The Problem:
I'm working with the 'airoboros-l2-13b-gpt4-2.0' and 'airoboros-l2-7b-gpt4-m2.0' model using vLLM.
I keep encountering CUDA out-of-memory errors.
Recently, I ran into a mysterious "Magic no. error."
What I've Tried So Far:
I tweaked the 'config.json' file.
Adjusted parameters like 'hidden_size,' 'num_hidden_layers,' and 'num_attention_heads' to reduce model size.
Where I Need Help:
Understanding the Problem: Can someone help me break down these CUDA out-of-memory errors and the "Magic no. error"?
Optimizing 'config.json': I experimented, but maybe there are better settings for my hardware.
First Principles Approach: Let's start from scratch. How can we ensure the model runs efficiently on my setup?
Monitoring GPU Resources: What tools or techniques can I use to keep track of GPU memory usage?
Community Knowledge: Share your experiences. Let's build a collaborative space where we all learn together.
If you've faced similar challenges or have experience with optimizing models for limited GPU resources, your insights would be greatly appreciated.
Your assistance could not only help me but also benefit anyone working with resource-intensive models. Together, we'll conquer this challenge and make the most of our hardware.
Thanks for your help in advance. I'm looking forward to our discussion!
Magic no. error is a llama.cpp thing if I recall correctly ( i'm only ~70% sure on this ).
Did a hack while trying to do this on ios where I just deleted that checker entirely from llama.cpp but had limited success.
Best ppl do ask for this is anyone working on llama.cpp or MLC ai as they have deff ran into it and solved it before.