Update README - Add model details

#14

by Citaman - opened Mar 17

base: refs/heads/main

←

from: refs/pr/14

Discussion Files changed

+71

-2

Files changed (3) hide show

.gitattributes +1 -0
README.md +67 -2
model_logo.png +3 -0

.gitattributes CHANGED Viewed

@@ -482,3 +482,4 @@ ckpt/tensor00761_000 filter=lfs diff=lfs merge=lfs -text
 ckpt/tensor00762_000 filter=lfs diff=lfs merge=lfs -text
 ckpt/tensor00763_000 filter=lfs diff=lfs merge=lfs -text
 ckpt/tensor00764_000 filter=lfs diff=lfs merge=lfs -text

 ckpt/tensor00762_000 filter=lfs diff=lfs merge=lfs -text
 ckpt/tensor00763_000 filter=lfs diff=lfs merge=lfs -text
 ckpt/tensor00764_000 filter=lfs diff=lfs merge=lfs -text
+model_logo.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -2,8 +2,73 @@
 license: apache-2.0
 ---
 # Grok-1
-This repository contains the weights of the Grok-1 open-weights model.
 Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
@@ -18,4 +83,4 @@ You should be seeing output from the language model.
 Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
-p.s. we're hiring: https://x.ai/career

 license: apache-2.0
 ---
 # Grok-1
+---
+_This repository contains the weights of the Grok-1 open-weights model._
+**To get started with using the model, follow the instructions at** `github.com/xai-org/grok.`
+![The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.](./model_logo.png)
+<small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small>
+---
+                         ╔══════════════════════════╗
+                         ║                 _______  ║
+                         ║            /\   |_   _|  ║
+                         ║  __  __   /  \    | |    ║
+                         ║  \ \/ /  / /\ \   | |    ║
+                         ║   >  <  / ____ \ _| |_   ║
+                         ║  /_/\_\/_/    \_\_____|  ║
+                         ║                          ║
+                         ║  Understand the Universe ║
+                         ║      [https://x.ai]      ║
+                         ╚════════════╗╔════════════╝
+                             ╔════════╝╚═════════╗
+                             ║ xAI Grok-1 (314B) ║
+                             ╚════════╗╔═════════╝
+                ╔═════════════════════╝╚═════════════════════╗
+                ║ 314B parameter Mixture of Experts model    ║
+                ║ - Base model (not finetuned)               ║
+                ║ - 8 experts (2 active)                     ║
+                ║ - 86B active parameters                    ║
+                ║ - Apache 2.0 license                       ║
+                ║ - Code: https://github.com/xai-org/grok-1  ║
+                ║ - Happy coding!                            ║
+                ╚════════════════════════════════════════════╝
+## Model Configuration Details
+**Vocabulary Size**: 131,072
+**Special Tokens**:
+- Pad Token: 0
+- End of Sequence Token: 2
+**Sequence Length**: 8192
+### **Model Architecture**: MoE
+- **Embedding Size**: 6,144
+    - Rotary Embedding (RoPE)
+- **Layers**: 64
+- **Experts**: 8
+- **Selected Experts**: 2
+- **Widening Factor**: 8
+- **Key Size**: 128
+- **Query Heads**: 48
+- **Key Value Heads**: 8
+- **Activation Sharding**: Data-wise, Model-wise
+- **Tokenizer** : SentencePiece tokenizer
+### **Inference Configuration**:
+- Batch Size per Device: 0.125
+- Tokenizer: `./tokenizer.model`
+- Local Mesh: 1x8
+- Between Hosts: 1x1
+## Inference Details
 Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
 Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
+**p.s. we're hiring: https://x.ai/careers**

model_logo.png ADDED Viewed

Git LFS Details

SHA256: 5fc985296d2a853cce201117ba2d8be3d3b2f046b64eddd4d0eb5fdcf8aea71c
Pointer size: 132 Bytes
Size of remote file: 2.34 MB