Max output tokens for Llama 3.1

by abhirup-sainapse - opened Jul 23

Jul 23

I do not see any literature relating to the number of maximum output tokens supported by these models. Does anyone have any additional information?

chrischain

Jul 23

Probably 8192. This is the base value from which they scaled the context.

CHNtentes

Jul 24

In config.json:
"max_position_embeddings": 131072
So it's 128K.

chrischain

Jul 24

In config.json:
"max_position_embeddings": 131072
So it's 128K.

that's the context window, not necessarily the max output tokens its trained to produce as a response.

JoshJarabek

Jul 28

•

edited Jul 28

@abhirup-sainapse it's 4096. Couldn't find it anywhere so did binary search to figure out where it raises an exception. I don't know why it's so hard to find the max outputs and/or parameters for most models.

lcahill

Jul 31

In config.json:
"max_position_embeddings": 131072
So it's 128K.

that's the context window, not necessarily the max output tokens its trained to produce as a response.

@chrischain , my understanding of how LLMs work is to iteratively predict each next token. This would mean that the model is not trained to produce multiple tokens, rather, each pass through the model is generating one token in the context length, given all previous tokens. Wouldn't this mean that max output tokens is always [total context] - [input tokens]?

Or are you saying that the post-training dataset only includes examples where the assistant responds with X number of tokens, therefore the model will be more likely to output the eos_token before/at this point?

chrischain

Aug 12

•

edited Aug 12

@icahill, with an instruction-tuned model (like this one) the training-data is typically structured in a multi-turn conversation format. In this instance, the max output tokens would be the maximum size of the response it had seen during training. Since context was scaled from 8192 (via RoPe), we can safely assume the maximum size prompt it saw was 4096 tokens and the maximum size output it saw was 4096 tokens.

lcahill

Aug 16

Thanks @chrischain , that makes sense.

r2decide

Aug 20

@abhirup-sainapse it's 4096. Couldn't find it anywhere so did binary search to figure out where it raises an exception. I don't know why it's so hard to find the max outputs and/or parameters for most models.

Is this true for all the llama 3.1 model flavors or is it only true for the 405B instruct?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment