Nethermind/Mpt-Instruct-DotNet-XS

Upsides:

similar in quality (slightly worse) for C# code generation and explanation as 7b Nethermind/Mpt-Instruct-DotNet-S,
1b params size (2.6gb, bfloat16 finetuned),
6x smaller,
4x+ faster

Downsides:

Sometimes, sufferers from response repetition-reiteration-not-ending when outputting for general discussion questions
Slightly worse in code generation than 7b model
No GGML/LLAMA.cpp running on CPU support yet

Based on mosaicml/mpt-1b-redpajama-200b-dolly

Same data sources as in Nethermind/Mpt-Instruct-DotNet-S

Usage example:

import os
from glob import glob
import torch
import transformers
from transformers import PreTrainedTokenizerFast
from transformers import AutoTokenizer

out_name = "Nethermind/Mpt-Instruct-DotNet-XS"
model = transformers.AutoModelForCausalLM.from_pretrained(
    out_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
model.to('cuda:0')
model.eval()

from markdownify import markdownify as md
from bs4 import BeautifulSoup
from IPython.display import display, Markdown

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
tokenizer.pad_token = tokenizer.eos_token

INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
PROMPT_FOR_GENERATION_FORMAT = """{system}
{instruction_key}
{instruction}
{response_key}
""".format(
    system="{system}",
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
    response="{response}"
)


def output_loop(input_tokens, steps=2000):
    print(input_tokens.shape[1], 2000 - input_tokens.shape[1] )
    input_tokens = model.generate(input_tokens.to('cuda:0'), max_new_tokens=min(512, 1024 - input_tokens.shape[1]), do_sample=False, top_k=1, top_p=0.95)
    return input_tokens
    

def give_answer(instruction="Create a loop over [0, 6, 7 , 77] that prints its contentrs", system="Below is an instruction that describes a task. Write a response that appropriately completes the request.", ):
    question = PROMPT_FOR_GENERATION_FORMAT.format(system=system, instruction=instruction)
    tokenized_question = tokenizer.encode(question ,return_tensors='pt')                    
    outputs = output_loop(tokenized_question)
    answer = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    print(answer)
    return answer

give_answer("What is the main difference between a struct and a class in C#?")

outputs:

A struct is a value type, which means it can only hold a few values. It is often used as a placeholder for other data types. A class, on the other hand, is a reference type, which means it can hold references to other data types.

On RTX 4090 new token sizes:

2sec for 128 tokens
5sec for 256 tokens
11sec for 512 tokens

Code generation: prompt:

Generate code to answer the question.

How would you retrieve and analyse the fee history for the last 100 blocks and determine the average gas price?

Example of code output:

public async Task<decimal> GetFeeHistoryGasPriceAverage()
{
  // Get the fee history
  ResultWrapper<FeeHistoryResults> result = await _ethRpc.eth_feeHistory(100, BlockParameter.Latest,
      new double[] { 50, 75, 90 });
  // Check if the API call succeeded
  if (result.Result!= Result.Success)
  {
     throw new Exception("Failed to retrieve fee history");
  }    
  // Get the gas price average
  decimal averageGasPrice = result.Data.BaseFeePerGas.Average();

  return averageGasPrice;
}