Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,8 @@ Amazingly quick to inference on Ada GPUs like 3090 Ti. in INT8. In VLLM I left i
|
|
5 |
Averaged over a second, that's 22.5k t/s prompt processing and 1.5k t/s generation.
|
6 |
Averaged over an hour that's 81M input tokens and 5.5M output tokens. Peak generation speed I see is around 2.6k/2.8k t/s.
|
7 |
|
|
|
|
|
8 |
Creation script:
|
9 |
|
10 |
```python
|
|
|
5 |
Averaged over a second, that's 22.5k t/s prompt processing and 1.5k t/s generation.
|
6 |
Averaged over an hour that's 81M input tokens and 5.5M output tokens. Peak generation speed I see is around 2.6k/2.8k t/s.
|
7 |
|
8 |
+
Quantized on H100. On 3090 Ti I was OOMing.
|
9 |
+
|
10 |
Creation script:
|
11 |
|
12 |
```python
|