adamo1139 commited on
Commit
f485ece
1 Parent(s): 20f6d69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -5,6 +5,8 @@ Amazingly quick to inference on Ada GPUs like 3090 Ti. in INT8. In VLLM I left i
5
  Averaged over a second, that's 22.5k t/s prompt processing and 1.5k t/s generation.
6
  Averaged over an hour that's 81M input tokens and 5.5M output tokens. Peak generation speed I see is around 2.6k/2.8k t/s.
7
 
 
 
8
  Creation script:
9
 
10
  ```python
 
5
  Averaged over a second, that's 22.5k t/s prompt processing and 1.5k t/s generation.
6
  Averaged over an hour that's 81M input tokens and 5.5M output tokens. Peak generation speed I see is around 2.6k/2.8k t/s.
7
 
8
+ Quantized on H100. On 3090 Ti I was OOMing.
9
+
10
  Creation script:
11
 
12
  ```python