EXTENDED LENGTH FRANKENSTEIN
Is it possible to merge this fun model with yarn 70b for extended context? Or maybe to create a new frankenmodel with yarn?
You can try and find out. You only need enough RAM to load two 70B models at a time, can even swap to disk (would just be slower).
Do you have any suggstions on which layers and models to use?
UPDATE: I don't think that mergekit works with yarn.
ValueError: rope_scaling
must be a dictionary with with two fields, type
and factor
, got {'factor': 8.0, 'finetuned': True, 'original_max_position_embeddings': 4096, 'type': 'yarn'}
And after modifying config of yarn:
ValueError: rope_scaling
's type field must be one of ['linear', 'dynamic'], got yarn
Is it maybe possible to somehow merge llama and yi yi ass model or are they too different?
I finally made it. Merged 70b 32k model with itself. It actually works!
https://huggingface.co/ChuckMcSneed/DoubleGold-v0.1-123b-32k