sohi-g commited on
Commit
8aa0847
1 Parent(s): 78b1436

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +243 -15
README.md CHANGED
@@ -1,21 +1,249 @@
1
  ---
2
- library_name: peft
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
- ## Training procedure
5
 
 
6
 
7
- The following `bitsandbytes` quantization config was used during training:
8
- - quant_method: bitsandbytes
9
- - load_in_8bit: False
10
- - load_in_4bit: True
11
- - llm_int8_threshold: 6.0
12
- - llm_int8_skip_modules: None
13
- - llm_int8_enable_fp32_cpu_offload: False
14
- - llm_int8_has_fp16_weight: False
15
- - bnb_4bit_quant_type: nf4
16
- - bnb_4bit_use_double_quant: True
17
- - bnb_4bit_compute_dtype: float16
18
- ### Framework versions
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- - PEFT 0.4.0
 
1
  ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - briefai/LongShort-Dataset
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - pytorch
10
+ - mistral
11
+ - Gen-AI
12
+ - Finance
13
+ - KPI Extraction
14
  ---
15
+ # LongShort-Mistral-7B
16
 
17
+ 🤗 [Huggingface Model Card](https://huggingface.co/briefai/LongShort-Mistral-7B)
18
 
19
+ ### Model Description
 
 
 
 
 
 
 
 
 
 
 
20
 
21
+ LongShort-Mistral-7B is a large language model fine-tuned on earnings call documents to extract financial KPIs from the earnings call documents. It is based on the Mistral-7B Instruct Architecture.
22
+ - Model creator: [Brief AI](https://huggingface.co/briefai)
23
+ - Original model: [Mistral-7B-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
24
+
25
+ ### Dataset Description
26
+ - Data Source: Factiva
27
+ - Data Description: 28K+ Earnings Call Documents
28
+ - Data Scope: 1K+ public companies
29
+ - Fine Tuning Data: Collection of 60K+ samples.
30
+
31
+ ## Prompt template: LongShort-Mistral-7B
32
+
33
+ ```
34
+ [INST]Given the context, answer the question.
35
+
36
+ ### Question:
37
+ Extract all the finance-based performance indicators and evaluation metrics.
38
+
39
+ ### Context:
40
+ {context}
41
+
42
+ ### Answer:
43
+ [/INST]
44
+
45
+ ```
46
+
47
+ ## Basics
48
+ *This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
49
+ *It is useful for anyone who wants to reference the model.*
50
+
51
+
52
+ **Developed by:** [Brief AI Team](https://huggingface.co/briefai)
53
+
54
+ **Model Type:** Transformer-based Large Language Model
55
+
56
+ **Version:** 1.0.0
57
+
58
+ **Languages:** English
59
+
60
+ **License:** Apache 2.0
61
+
62
+ **Release Date Estimate:** Wednesday, 29.November.2023
63
+
64
+ **Send Questions to:** [email protected]
65
+
66
+ **Cite as:** Brief AI LongShort Language Model
67
+
68
+ **Funded by:** UChicago Data Science Institute
69
+
70
+ **Mentored by:** Nick Kadochnikov
71
+
72
+ ## Technical Specifications
73
+ *This section includes details about the model objective and architecture, and the compute infrastructure.*
74
+ *It is useful for people interested in model development.*
75
+
76
+ Please see [the LongShort training README](https://github.com/brief-ai-uchicago/LongShort-Dataset) for full details on replicating training.
77
+
78
+ ### Model Architecture and Objective
79
+
80
+ * Modified from Mistral-7B-Instruct
81
+
82
+ **Objective:** Financial KPI extraction from earnings call documents.
83
+
84
+ ### Hardware and Software - Compute Infrastructure
85
+
86
+ * 4 NVIDIA L4 GPUs & 48 vCPUs
87
+
88
+ * Environment: PyTorch (pytorch-2.0 w/ CUDA-11.8; see [Github link](https://github.com/pytorch/pytorch))
89
+
90
+ * CPU: GCP G2 Standard 48 (Platform: Intel Cascade Lake) (Accelerator Optimized)
91
+
92
+ * CPU memory: 192GB RAM
93
+
94
+ * GPU memory: 30GB per GPU
95
+
96
+ ## Training
97
+ *This section provides information about the training.*
98
+ *It is useful for people who want to learn more about the model inputs and training footprint.*
99
+
100
+ The following bits and bytes quantization config was used during training:
101
+
102
+ * quant_method: bitsandbytes
103
+ * load_in_8bit: False
104
+ * load_in_4bit: True
105
+ * llm_int8_threshold: 6.0
106
+ * llm_int8_skip_modules: None
107
+ * llm_int8_enable_fp32_cpu_offload: False
108
+ * llm_int8_has_fp16_weight: False
109
+ * bnb_4bit_quant_type: nf4
110
+ * bnb_4bit_use_double_quant: True
111
+ * bnb_4bit_compute_dtype: float16
112
+
113
+ Framework versions
114
+ * PEFT 0.4.0
115
+
116
+
117
+ ### Training Data
118
+ *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
119
+
120
+ Details for the dataset can be found in [LongShort Dataset](https://github.com/brief-ai-uchicago/LongShort-Dataset)
121
+
122
+ Training data includes:
123
+
124
+ - 5000 Earnings Call Documents
125
+
126
+ ## How to use
127
+
128
+ This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
129
+
130
+ [LongShort-Mistral-7B](https://huggingface.co/briefai/LongShort-Mistral-7B)
131
+
132
+ ## Intended Use
133
+
134
+ This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pre-trained base model that can be further fine-tuned for specific tasks. The use cases below are not exhaustive.
135
+
136
+ ### Direct Use
137
+
138
+ - Text generation
139
+
140
+ - Exploring characteristics of language generated by a language model
141
+
142
+ - Examples: Cloze tests, counterfactuals, generations with reframings
143
+
144
+ ### Downstream Use
145
+
146
+ - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization
147
+
148
+
149
+ #### Out-of-scope Uses
150
+
151
+ Using the model in [high-stakes](#high-stakes) settings is out of scope for this model. The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct.
152
+
153
+ Out-of-scope Uses Include:
154
+
155
+ - Usage for evaluating or scoring individuals, such as for employment, education, or credit
156
+
157
+ - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct
158
+
159
+ #### Misuse
160
+
161
+ Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes:
162
+
163
+ - Spam generation
164
+
165
+ - Disinformation and influence operations
166
+
167
+ - Disparagement and defamation
168
+
169
+ - Harassment and abuse
170
+
171
+ - [Deception](#deception)
172
+
173
+ - Unconsented impersonation and imitation
174
+
175
+ - Unconsented surveillance
176
+
177
+ - Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license)
178
+
179
+ ## Intended Users
180
+
181
+ ### Direct Users
182
+
183
+ - General Public
184
+
185
+ - Researchers
186
+
187
+ - Students
188
+
189
+ - Educators
190
+
191
+ - Engineers/developers
192
+
193
+ - Non-commercial entities
194
+
195
+ - Financial Industry
196
+
197
+ # Risks and Limitations
198
+ *This section identifies foreseeable harms and misunderstandings.*
199
+
200
+ Model may:
201
+
202
+ - Overrepresent some viewpoints and underrepresent others
203
+
204
+ - Contain stereotypes
205
+
206
+ - Contain [personal information](#personal-data-and-information)
207
+
208
+ - Generate:
209
+
210
+ - Hateful, abusive, or violent language
211
+
212
+ - Discriminatory or prejudicial language
213
+
214
+ - Content that may not be appropriate for all settings, including sexual content
215
+
216
+ - Make errors, including producing incorrect information as if it were factual
217
+
218
+ - Generate irrelevant or repetitive outputs
219
+
220
+ - Induce users into attributing human traits to it, such as sentience or consciousness
221
+
222
+
223
+ # Evaluation
224
+ *This section describes the evaluation protocols and provides the results.*
225
+
226
+ Result: LongShort-Llama-2-13B gives 43.4% accuracy on a validation set of 10% of the original training dataset.
227
+
228
+
229
+
230
+ **Train-time Evaluation:**
231
+
232
+ Final checkpoint after 300 epochs:
233
+
234
+ - Training Loss: 1.228
235
+
236
+
237
+
238
+ # Recommendations
239
+ *This section provides information on warnings and potential mitigations.*
240
+
241
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
242
+
243
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
244
+
245
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
246
+
247
+ # Model Card Authors
248
+ Vishal Parameshwaran, Garima Sohi, Jose Gerala, Sanchit Narayan Kumar
249