allenai
/

OLMo-7B-0424-hf

@@ -9,9 +9,9 @@ language:
 <img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
-# Model Card for OLMo 1.7-7B-hf
-OLMo 1.7 7B is the latest version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a 24 point increase in MMLU, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
 **This version is for direct use with HuggingFace Transformers** from v4.40 on.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
@@ -26,27 +26,22 @@ The core models released in this batch are the following:
 | [OLMo 1B](https://huggingface.co/allenai/OLMo-1B)   | 3 Trillion |16     | 2048        | 16              | 2048  |
 | [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion   | 32     | 4096        | 32              |  2048  |
 | [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion   | 32     | 4096        | 32              |  2048  |
-| [OLMo 1.7-7B](https://huggingface.co/allenai/OLMo-1.7-7B) | 2.05 Trillion   | 32     | 4096        | 32              |  4096  |
-*Note: OLMo 1.7-7B also includes QKV clipping.*
-[Coming soon] We are releasing many checkpoints for these models, for every 1000 training steps.
-The naming convention is `step1000-tokens4B`.
 To load a specific model revision with HuggingFace, simply add the argument `revision`:
 ```bash
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf", revision="step1000-tokens4B")
 ```
 All revisions/branches are listed in the file `revisions.txt`.
 Or, you can access all the revisions for the models via the following code snippet:
 ```python
 from huggingface_hub import list_repo_refs
-out = list_repo_refs("allenai/OLMo-1.7-7B-hf")
 branches = [b.name for b in out.branches]
 ```
-A few revisions were lost due to an error, but the vast majority are present.
 ### Model Description
@@ -75,13 +70,11 @@ A few revisions were lost due to an error, but the vast majority are present.
 ### Inference
-Install Transformers [from source](https://huggingface.co/docs/transformers/en/installation#install-from-source), or update to the next version when this [PR](https://github.com/huggingface/transformers/pull/29890) is integrated.
-Now, proceed as usual with HuggingFace:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
-tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")
 message = ["Language modeling is "]
 inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
 # optional verifying cuda
@@ -94,20 +87,14 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
 Alternatively, with the pipeline abstraction:
 ```python
 from transformers import pipeline
-olmo_pipe = pipeline("text-generation", model="allenai/OLMo-1.7-7B-hf")
 print(olmo_pipe("Language modeling is "))
 >> 'Language modeling is a branch of natural language processing that aims to...'
 ```
-Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
 The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
-Note, you may see the following error if `ai2-olmo` is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.
-```bash
-    raise ImportError(
-ImportError: This modeling file requires the following packages that were not found in your environment: hf_olmo. Run `pip install hf_olmo`
-```
 ### Fine-tuning
 Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
 1. Fine-tune with the OLMo repository:
@@ -225,7 +212,7 @@ Optimizer settings comparison with peer models.
-## Environmental Impact
 OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML.
 A summary of the environmental impact. Further details are available in the paper.
@@ -233,7 +220,7 @@ A summary of the environmental impact. Further details are available in the pape
 |           | GPU Type   | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) |
 |-----------|------------|-----------------------------|--------------------------------|---------------------------|
 | OLMo 7B Twin  | MI250X ([LUMI supercomputer](https://www.lumi-supercomputer.eu))   |  135 MWh                     | 0*                             | 0*                        |
-| OLMo 7B   | A100-40GB ([MosaicML](https://www.mosaicml.com)) |  104 MWh                     | 0.656                          | 75.05                     |
 ## Bias, Risks, and Limitations

 <img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# Model Card for OLMo 7B April 2024
+OLMo 7B April 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a 24 point increase in MMLU, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
 **This version is for direct use with HuggingFace Transformers** from v4.40 on.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
 | [OLMo 1B](https://huggingface.co/allenai/OLMo-1B)   | 3 Trillion |16     | 2048        | 16              | 2048  |
 | [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion   | 32     | 4096        | 32              |  2048  |
 | [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion   | 32     | 4096        | 32              |  2048  |
+| [OLMo 7B April 2024](https://huggingface.co/allenai/OLMo-7B-0424-hf) | 2.05 Trillion   | 32     | 4096        | 32              |  4096  |
+*Note: OLMo 7B April 2024 also includes QKV clipping.*
 To load a specific model revision with HuggingFace, simply add the argument `revision`:
 ```bash
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0424-hf", revision="step1000-tokens4B")
 ```
 All revisions/branches are listed in the file `revisions.txt`.
 Or, you can access all the revisions for the models via the following code snippet:
 ```python
 from huggingface_hub import list_repo_refs
+out = list_repo_refs("allenai/OLMo-7B-0424-hf")
 branches = [b.name for b in out.branches]
 ```
 ### Model Description
 ### Inference
+Proceed as usual with HuggingFace:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0424-hf")
+tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-0424-hf")
 message = ["Language modeling is "]
 inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
 # optional verifying cuda
 Alternatively, with the pipeline abstraction:
 ```python
 from transformers import pipeline
+olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-0424-hf")
 print(olmo_pipe("Language modeling is "))
 >> 'Language modeling is a branch of natural language processing that aims to...'
 ```
+Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0424-hf", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
 The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
 ### Fine-tuning
 Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
 1. Fine-tune with the OLMo repository:
+<!-- ## Environmental Impact
 OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML.
 A summary of the environmental impact. Further details are available in the paper.
 |           | GPU Type   | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) |
 |-----------|------------|-----------------------------|--------------------------------|---------------------------|
 | OLMo 7B Twin  | MI250X ([LUMI supercomputer](https://www.lumi-supercomputer.eu))   |  135 MWh                     | 0*                             | 0*                        |
+| OLMo 7B   | A100-40GB ([MosaicML](https://www.mosaicml.com)) |  104 MWh                     | 0.656                          | 75.05                     | -->
 ## Bias, Risks, and Limitations