--- base_model: Universal-NER/UniNER-7B-all tags: - named entity recognition - ner model-index: - name: daisd-ai/UniNER-W4A16 results: [] license: cc-by-nc-4.0 inference: false --- ## Introduction This model is quantized version of [Universal-NER/UniNER-7B-all](https://huggingface.co/Universal-NER/UniNER-7B-all). ## Quantization The quantization was applied using [LLM Compressor](https://github.com/vllm-project/llm-compressor) with 512 random examples from [Universal-NER/Pile-NER-definition](https://huggingface.co/datasets/Universal-NER/Pile-NER-definition) dataset. The recipe for quantization: ```python recipe = [ SmoothQuantModifier(smoothing_strength=0.8), GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]), ] ``` ## Inference We added chat template for the tokenizer, thus it can be directly used with vLLM without any other preprocessing compered to original model. Example: ```python import json from vllm import LLM, SamplingParams # Loading model llm = LLM(model="daisd-ai/UniNER-W4A16") sampling_params = SamplingParams(temperature=0, max_tokens=256) # Define text and entities types text = "Some long text with multiple entities" entities_types = ["entity type 1", "entity type 2"] # Applying tokenizer prompts = [] for entity_type in entities_types: messages = [ { "role": "user", "content": f"Text: {text}", }, {"role": "assistant", "content": "I've read this text."}, {"role": "user", "content":f"What describes {entity_type} in the text?"}, ] prompt = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) prompts.append(prompt) # Run inference outputs = llm.generate(prompts, self.sampling_params) outputs = [output.outputs[0].text for output in outputs] # Results are returned is JSON format, parse it to python list results = [] for lst in outputs: try: entities = list(set(json.loads(lst))) except Exception: entities = [] results.append(entities) ```