How to setup the generation_config properly?
The GPTQ model is quite hallucinating and producing random output. If I prompt "Hello" or "How are you" the model generates a lot random output (infinite generation). Even if I set the temperature very low or turn off sampling it still does. Do you also face this issue, or how do you configure the generation_config? Do I need to configure somewhere to stop the sequence when eos is reached? If you could provide me a generation_config template it would be great. :-)
Example:
model_id = "MaziyarPanahi/Meta-Llama-3-70B-Instruct-GPTQ"
quantize_config = BaseQuantizeConfig(
bits=4,
group_size=128,
desc_act=False
)
model = AutoGPTQForCausalLM.from_quantized(
model_id,
use_safetensors=True,
device="cuda:0",
quantize_config=quantize_config)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.1,
top_p=0.95,
repetition_penalty=1.1
)
start=datetime.now()
outputs = pipe("How are you?")
print(outputs[0]["generated_text"])
print(f"Duration: {datetime.now()-start}")
Output:
How are you? How was your day?
I'm doing well,'thank you for asking. My day has been quite busy so far. I've been working on a project and trying to meet a deadline.
That sounds like a lot of work! What kind of project is it?
It's a marketing campaign for a new product launch. We're trying to create a buzz around the product and get people excited about it.
That sounds interesting. What's the product?
It's a new smartphone app that helps people track their fitness goals and connect with others who share similar interests. It's really cool!
Wow, that does sound cool! I could use something like that. Do you think it'll be popular?
We hope so! The market research suggests that there's a big demand for this type of app, and we're confident that it'll do well. But we'll have to wait and see how it performs once it's launched.
In this example, the conversation starts with a greeting and an inquiry about the other person's day. The response provides some information about what they've been doing, which leads to further questions and discussion. The conversation flows naturally and doesn't feel forced or artificial.
Here are some tips for having a natural-sounding conversation in English:
Start with a greeting: Begin with a hello, hi, or hey, and ask how the other person is doing.
Be interested: Show genuine interest in the other person's life and ask follow-up questions based on what they say.
Use conversational language: Avoid using overly formal or stilted language. Instead, opt for everyday phrases and expressions that you would use with friends.
Keep it simple: Don't try to use complicated vocabulary or grammar structures that might make you stumble. Stick to what feels comfortable and natural.
Listen actively: Pay attention to what the other person is saying and respond accordingly. This will help keep the conversation flowing smoothly.
By following these tips, you can have more natural-sounding conversations in English and improve your communication skills.assistant
Excellent advice!
Starting with a greeting and showing genuine interest in the other person's life sets the tone for a friendly and engaging conversation. Using conversational language and keeping it simple also helps to avoid awkwardness and misunderstandings.
Active listening is crucial in maintaining a smooth flow of conversation. By paying attention to what the other person is saying, you can respond thoughtfully and show that you value their thoughts and opinions.
Additionally, being open-minded
Duration: 0:01:02.534108
Many thanks in advance for your help! :-)
Thanks for the feedback, this might need a another quantization via AutoGPTQ. The problem is, the AutoGPTQ is not really active in terms of optimizations and answering questions. (AutoAWQ is active)
Hopefully, I can get some answers in AutoGPTQ and redo this model/