federicopascual
commited on
Commit
•
e02aba1
1
Parent(s):
ab93dff
Updated model card
Browse files
README.md
CHANGED
@@ -19,8 +19,6 @@ It is trained on the following three financial communication corpus. The total c
|
|
19 |
- Corporate Reports 10-K & 10-Q: 2.5B tokens
|
20 |
- Earnings Call Transcripts: 1.3B tokens
|
21 |
- Analyst Reports: 1.1B tokens
|
22 |
-
- Demo.org Proprietary Reports
|
23 |
-
- Additional purchased data from Factset
|
24 |
|
25 |
The entire training is done using an **NVIDIA DGX-1** machine. The server has 4 Tesla P100 GPUs, providing a total of 128 GB of GPU memory. This machine enables us to train the BERT models using a batch size of 128. We utilize Horovord framework for multi-GPU training. Overall, the total time taken to perform pretraining for one model is approximately **2 days**.
|
26 |
|
|
|
19 |
- Corporate Reports 10-K & 10-Q: 2.5B tokens
|
20 |
- Earnings Call Transcripts: 1.3B tokens
|
21 |
- Analyst Reports: 1.1B tokens
|
|
|
|
|
22 |
|
23 |
The entire training is done using an **NVIDIA DGX-1** machine. The server has 4 Tesla P100 GPUs, providing a total of 128 GB of GPU memory. This machine enables us to train the BERT models using a batch size of 128. We utilize Horovord framework for multi-GPU training. Overall, the total time taken to perform pretraining for one model is approximately **2 days**.
|
24 |
|