arcee-ai/Llama-3.1-SuperNova-Lite · Fixed tokenizer.json, so it is equal with LLama-3.1-8B-Instruct's tokenizer.json

Joseph717171

Sep 11

No description provided.

Joseph717171

Sep 11

•

edited Sep 11

Since you guys trained on top of LLaMa-3.1-8B-Instruct, I found it odd that your config.json files were different, mainly that Llama-3.1-SuperNova-Lite/tokenizer.json's was missing somethings from meta-llama/Meta-Llama-3.1-8B-Instruct/tokenizer.json. This PR seeks to fix this.

a="/Users/jsarnecki/opt/Workspace/arcee-ai/Llama-3.1-SuperNova-Lite/tokenizer.json"
b="/Users/jsarnecki/opt/Workspace/meta-llama/Meta-Llama-3.1-8B-Instruct/tokenizer.json"

diff "$a" "$b"
2332,2335c2332,2394
<     "type": "ByteLevel",
<     "add_prefix_space": true,
<     "trim_offsets": false,
<     "use_regex": true
---
>     "type": "Sequence",
>     "processors": [
>       {
>         "type": "ByteLevel",
>         "add_prefix_space": true,
>         "trim_offsets": false,
>         "use_regex": true
>       },
>       {
>         "type": "TemplateProcessing",
>         "single": [
>           {
>             "SpecialToken": {
>               "id": "<|begin_of_text|>",
>               "type_id": 0
>             }
>           },
>           {
>             "Sequence": {
>               "id": "A",
>               "type_id": 0
>             }
>           }
>         ],
>         "pair": [
>           {
>             "SpecialToken": {
>               "id": "<|begin_of_text|>",
>               "type_id": 0
>             }
>           },
>           {
>             "Sequence": {
>               "id": "A",
>               "type_id": 0
>             }
>           },
>           {
>             "SpecialToken": {
>               "id": "<|begin_of_text|>",
>               "type_id": 1
>             }
>           },
>           {
>             "Sequence": {
>               "id": "B",
>               "type_id": 1
>             }
>           }
>         ],
>         "special_tokens": {
>           "<|begin_of_text|>": {
>             "id": "<|begin_of_text|>",
>             "ids": [
>               128000
>             ],
>             "tokens": [
>               "<|begin_of_text|>"
>             ]
>           }
>         }
>       }
>     ]

Crystalcareai changed pull request status to open Sep 12

Crystalcareai changed pull request status to merged Sep 12