softmax commited on
Commit
cc1f1b4
1 Parent(s): 42a4564

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Llama-2-70b-chat-hf
3
+ inference: true
4
+ model_type: llama
5
+ quantized_by: softmax
6
+ tags:
7
+ - nm-vllm
8
+ - marlin
9
+ - int4
10
+ ---
11
+
12
+ ## Llama-2-70b-chat-hf
13
+ This repo contains model files for [Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
14
+
15
+ This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.
16
+
17
+ ## Inference
18
+ Install [nm-vllm](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
19
+ ```bash
20
+ pip install nm-vllm[sparse]
21
+ ```
22
+
23
+ Run in a Python pipeline for local inference:
24
+ ```python
25
+ from transformers import AutoTokenizer
26
+ from vllm import LLM, SamplingParams
27
+
28
+ model_id = "softmax/Llama-2-70b-chat-hf-marlin"
29
+ model = LLM(model_id)
30
+
31
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
32
+ messages = [
33
+ {"role": "user", "content": "What is synthetic data in machine learning?"},
34
+ ]
35
+ formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
36
+ sampling_params = SamplingParams(max_tokens=200)
37
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
38
+ print(outputs[0].outputs[0].text)
39
+
40
+ """
41
+ Synthetic data, also known as artificial data or simulated data, is data that is artificially generated using various methods, rather than being collected from real-world sources. Synthetic data can be used to augment or substitute real-world data in machine learning applications, and can be particularly useful when real-world data is limited, expensive, or difficult to obtain.
42
+
43
+ There are several ways to generate synthetic data, including:
44
+
45
+ 1. Data augmentation: This involves transforming existing data, such as images or time series data, to create new data that can be used to augment a training set. For example, an image recognition model can be trained on a dataset of images that have been rotated, scaled, and flipped to create new images that the model has not seen before.
46
+ 2. Generative models: These models use algorithms to generate new data that resembles real-world data. Generative adversarial networks (GAN
47
+ """
48
+ ```
49
+
50
+ ## Quantization
51
+ For details on how this model was quantized and converted to marlin format, please refer to this [notebook](https://github.com/neuralmagic/nm-vllm/blob/c2f8ec48464511188dcca6e49f841ebf67b97153/examples-neuralmagic/marlin_quantization_and_deploy/Performantly_Quantize_LLMs_to_4_bits_with_Marlin_and_nm_vllm.ipynb).