GPTQ Model doesnt work

#16
by rjmehta - opened

Model doesnt print anything. Just blank spaces. Using exllamav2. @TheBloke

INPUT:

settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.85
settings.top_k = 50
settings.top_p = 0.8
settings.token_repetition_penalty = 1
#settings.disallow_tokens(tokenizer, [tokenizer.eos_token_id])
max_new_tokens = 10

Prompt

prompt = f"""Write a working python code.
/#/#/# Instruction:
Write a working python code to generate 100 random numbers.
/#/#/# Response:

"""
input_ids = tokenizer.encode(prompt)
prompt_tokens = input_ids.shape[-1]
generator.warmup()

time_begin_prompt = time.time()
print (prompt, end = "")
sys.stdout.flush()
generator.set_stop_conditions([])
generator.begin_stream(input_ids, settings)
time_begin_stream = time.time()
generated_tokens = 0
while True:
chunk, eos, _ = generator.stream()
generated_tokens += 1
print (chunk, end = "")
sys.stdout.flush()
if eos or generated_tokens == max_new_tokens: break
time_end = time.time()
time_prompt = time_begin_stream - time_begin_prompt
time_tokens = time_end - time_begin_stream
print()
print()
print(f"Prompt processed in {time_prompt:.2f} seconds, {prompt_tokens} tokens, {prompt_tokens / time_prompt:.2f} tokens/second")
print(f"Response generated in {time_tokens:.2f} seconds, {generated_tokens} tokens, {generated_tokens / time_tokens:.2f} tokens/second")

OUTPUT:

Write a working python code.
/#/#/# Instruction:
Write a working python code to generate 100 random numbers.
/#/#/# Response:

Prompt processed in 0.00 seconds, 32 tokens, 27396.96 tokens/second
Response generated in 0.43 seconds, 10 tokens, 23.49 tokens/second"""

Okay. I had to manually set the rope_scale to 4.0. But gptq doesnt print EOS token

Model doesnt print anything. Just blank spaces. Using exllamav2. @TheBloke

INPUT:

settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.85
settings.top_k = 50
settings.top_p = 0.8
settings.token_repetition_penalty = 1
#settings.disallow_tokens(tokenizer, [tokenizer.eos_token_id])
max_new_tokens = 10

Prompt

prompt = f"""Write a working python code.
/#/#/# Instruction:
Write a working python code to generate 100 random numbers.
/#/#/# Response:

"""
input_ids = tokenizer.encode(prompt)
prompt_tokens = input_ids.shape[-1]
generator.warmup()

time_begin_prompt = time.time()
print (prompt, end = "")
sys.stdout.flush()
generator.set_stop_conditions([])
generator.begin_stream(input_ids, settings)
time_begin_stream = time.time()
generated_tokens = 0
while True:
chunk, eos, _ = generator.stream()
generated_tokens += 1
print (chunk, end = "")
sys.stdout.flush()
if eos or generated_tokens == max_new_tokens: break
time_end = time.time()
time_prompt = time_begin_stream - time_begin_prompt
time_tokens = time_end - time_begin_stream
print()
print()
print(f"Prompt processed in {time_prompt:.2f} seconds, {prompt_tokens} tokens, {prompt_tokens / time_prompt:.2f} tokens/second")
print(f"Response generated in {time_tokens:.2f} seconds, {generated_tokens} tokens, {generated_tokens / time_tokens:.2f} tokens/second")

OUTPUT:

Write a working python code.
/#/#/# Instruction:
Write a working python code to generate 100 random numbers.
/#/#/# Response:

Prompt processed in 0.00 seconds, 32 tokens, 27396.96 tokens/second
Response generated in 0.43 seconds, 10 tokens, 23.49 tokens/second"""

The GPTQ Model is not officially released by DeepSeek. Please direct your questions to TheBloke's Huggingface account: https://huggingface.co/TheBloke?search_models=deepseek-coder

luofuli changed discussion status to closed

Sign up or log in to comment