HachiML commited on
Commit
13df84c
1 Parent(s): 7eec734

Upload MistsForConditionalGeneration

Browse files
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
config.json ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistsForConditionalGeneration"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_mists.MistsConfig",
7
+ "AutoModel": "modeling_mists.MistsForConditionalGeneration"
8
+ },
9
+ "ignore_index": -100,
10
+ "model_type": "mists",
11
+ "pad_token_id": 32769,
12
+ "projector_hidden_act": "gelu",
13
+ "text_config": {
14
+ "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
15
+ "architectures": [
16
+ "MistralForCausalLM"
17
+ ],
18
+ "max_position_embeddings": 32768,
19
+ "model_type": "mistral",
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_theta": 1000000.0,
22
+ "sliding_window": null,
23
+ "torch_dtype": "bfloat16",
24
+ "vocab_size": 32832
25
+ },
26
+ "time_series_config": {
27
+ "_name_or_path": "HachiML/MOMENT-1-large-embedding-v0.1",
28
+ "architectures": [
29
+ "MomentEmbeddingModel"
30
+ ],
31
+ "auto_map": {
32
+ "AutoConfig": "HachiML/MOMENT-1-large-embedding-v0.1--configuration_moment.MomentConfig",
33
+ "AutoModel": "HachiML/MOMENT-1-large-embedding-v0.1--modeling_moment.MomentEmbeddingModel"
34
+ },
35
+ "mask_ratio": 0.0,
36
+ "model_type": "moment",
37
+ "patch_len": 8,
38
+ "patch_stride_len": 8,
39
+ "revin_affine": false,
40
+ "t5_config": {
41
+ "add_cross_attention": false,
42
+ "attn_implementation": null,
43
+ "bad_words_ids": null,
44
+ "begin_suppress_tokens": null,
45
+ "bos_token_id": null,
46
+ "chunk_size_feed_forward": 0,
47
+ "classifier_dropout": 0.0,
48
+ "cross_attention_hidden_size": null,
49
+ "d_ff": 2816,
50
+ "d_kv": 64,
51
+ "d_model": 1024,
52
+ "decoder_start_token_id": 0,
53
+ "dense_act_fn": "gelu_new",
54
+ "diversity_penalty": 0.0,
55
+ "do_sample": false,
56
+ "dropout_rate": 0.1,
57
+ "early_stopping": false,
58
+ "encoder_no_repeat_ngram_size": 0,
59
+ "eos_token_id": 1,
60
+ "exponential_decay_length_penalty": null,
61
+ "feed_forward_proj": "gated-gelu",
62
+ "finetuning_task": null,
63
+ "forced_bos_token_id": null,
64
+ "forced_eos_token_id": null,
65
+ "id2label": {
66
+ "0": "LABEL_0",
67
+ "1": "LABEL_1"
68
+ },
69
+ "initializer_factor": 1.0,
70
+ "is_decoder": false,
71
+ "is_encoder_decoder": true,
72
+ "is_gated_act": true,
73
+ "label2id": {
74
+ "LABEL_0": 0,
75
+ "LABEL_1": 1
76
+ },
77
+ "layer_norm_epsilon": 1e-06,
78
+ "length_penalty": 1.0,
79
+ "max_length": 20,
80
+ "min_length": 0,
81
+ "n_positions": 512,
82
+ "no_repeat_ngram_size": 0,
83
+ "num_beam_groups": 1,
84
+ "num_beams": 1,
85
+ "num_decoder_layers": 24,
86
+ "num_heads": 16,
87
+ "num_layers": 24,
88
+ "num_return_sequences": 1,
89
+ "output_attentions": false,
90
+ "output_hidden_states": false,
91
+ "output_past": true,
92
+ "output_scores": false,
93
+ "pad_token_id": 0,
94
+ "prefix": null,
95
+ "problem_type": null,
96
+ "pruned_heads": {},
97
+ "relative_attention_max_distance": 128,
98
+ "relative_attention_num_buckets": 32,
99
+ "remove_invalid_values": false,
100
+ "repetition_penalty": 1.0,
101
+ "return_dict": true,
102
+ "return_dict_in_generate": false,
103
+ "sep_token_id": null,
104
+ "suppress_tokens": null,
105
+ "task_specific_params": null,
106
+ "temperature": 1.0,
107
+ "tf_legacy_loss": false,
108
+ "tie_encoder_decoder": false,
109
+ "tie_word_embeddings": false,
110
+ "tokenizer_class": null,
111
+ "top_k": 50,
112
+ "top_p": 1.0,
113
+ "torch_dtype": null,
114
+ "torchscript": false,
115
+ "typical_p": 1.0,
116
+ "use_bfloat16": false,
117
+ "use_cache": true,
118
+ "vocab_size": 32128
119
+ },
120
+ "torch_dtype": "float32"
121
+ },
122
+ "time_series_hidden_size": 1024,
123
+ "time_series_token_index": 32768,
124
+ "torch_dtype": "float32",
125
+ "transformers_version": "4.41.2"
126
+ }
configuration_mists.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import warnings
2
+
3
+ from transformers import PretrainedConfig
4
+ from transformers import CONFIG_MAPPING
5
+
6
+ from .configuration_moment import MomentConfig
7
+
8
+ class MistsConfig(PretrainedConfig):
9
+ model_type = "mists"
10
+
11
+ def __init__(
12
+ self,
13
+ time_series_config=None,
14
+ text_config=None,
15
+ ignore_index=-100,
16
+ time_series_token_index=32000,
17
+ projector_hidden_act="gelu", # projector用
18
+ # time_series_feature_select_strategy="default", # TODO: modelのforward用(画像モデルのhidden_stateからEmbeddingをどう取得するか)。将来的に対応。
19
+ # time_series_feature_layer=-2, # modelのforward用 # TODO: modelのforward用(画像モデルのhidden_stateからEmbeddingをどう取得するか)。将来的に対応。
20
+ time_series_hidden_size=1024, # projector用
21
+ **kwargs,
22
+ ):
23
+
24
+ self.ignore_index = ignore_index
25
+ self.time_series_token_index = time_series_token_index
26
+ self.projector_hidden_act = projector_hidden_act
27
+ self.time_series_hidden_size = time_series_hidden_size
28
+
29
+ # 将来的に、MomentモデルがTransformersに登録されることを想定して追加する
30
+ # そのため、CONFIG_MAPPINGは機能しない。
31
+ if isinstance(time_series_config, dict):
32
+ time_series_config["model_type"] = (
33
+ time_series_config["model_type"] if "model_type" in time_series_config else "moment"
34
+ )
35
+ # time_series_config = CONFIG_MAPPING[time_series_config["model_type"]](**time_series_config)
36
+ time_series_config = MomentConfig(**time_series_config)
37
+ elif time_series_config is None:
38
+ time_series_config = MomentConfig()
39
+
40
+ self.time_series_config = time_series_config
41
+
42
+ if isinstance(text_config, dict):
43
+ text_config["model_type"] = text_config["model_type"] if "model_type" in text_config else "mistral"
44
+ text_config = CONFIG_MAPPING[text_config["model_type"]](**text_config)
45
+ elif text_config is None:
46
+ text_config = CONFIG_MAPPING["mistral"]()
47
+
48
+ self.text_config = text_config
49
+
50
+ super().__init__(**kwargs)
51
+
52
+
53
+ def to_dict(self):
54
+ output = super().to_dict()
55
+ return output
56
+
57
+
58
+
59
+
60
+
configuration_moment.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Moment model configuration"""
2
+
3
+ from transformers import PretrainedConfig
4
+ from transformers import logging
5
+
6
+
7
+ DEFAULT_T5_CONFIG = {
8
+ # "_name_or_path": "google/flan-t5-large",
9
+ # "architectures": [
10
+ # "T5ForConditionalGeneration"
11
+ # ],
12
+ "classifier_dropout": 0.0,
13
+ "d_ff": 2816,
14
+ "d_kv": 64,
15
+ "d_model": 1024,
16
+ "decoder_start_token_id": 0,
17
+ "dense_act_fn": "gelu_new",
18
+ "dropout_rate": 0.1,
19
+ "eos_token_id": 1,
20
+ "feed_forward_proj": "gated-gelu",
21
+ "initializer_factor": 1.0,
22
+ "is_encoder_decoder": False,
23
+ "is_gated_act": True,
24
+ "layer_norm_epsilon": 1e-06,
25
+ # "model_type": "t5",
26
+ "n_positions": 512,
27
+ "num_decoder_layers": 24,
28
+ "num_heads": 16,
29
+ "num_layers": 24,
30
+ "output_past": True,
31
+ "pad_token_id": 0,
32
+ "relative_attention_max_distance": 128,
33
+ "relative_attention_num_buckets": 32,
34
+ "tie_word_embeddings": False,
35
+ # "transformers_version": "4.33.3",
36
+ "use_cache": False,
37
+ "vocab_size": 32128
38
+ }
39
+
40
+
41
+ class MomentConfig(PretrainedConfig):
42
+ model_type = "moment"
43
+
44
+ def __init__(
45
+ self,
46
+ t5_config: dict = DEFAULT_T5_CONFIG,
47
+ d_model: int = None,
48
+ seq_len: int = 512,
49
+ patch_len: int = 16,
50
+ patch_stride_len: int = 16,
51
+ dropout: float = 0.1,
52
+ revin_num_features: int = 1,
53
+ revin_eps: float = 1e-5,
54
+ revin_affine: bool = True,
55
+ add_positional_embedding: bool = True,
56
+ value_embedding_bias: bool = False,
57
+ orth_gain: float = 1.41,
58
+ mask_ratio: float = 0.15,
59
+ freeze_embedder: bool = True,
60
+ freeze_encoder: bool = True,
61
+ freeze_head: bool = False,
62
+ enable_gradient_checkpointing: bool = True,
63
+ randomly_initialize_backbone: bool = False,
64
+ **kwargs
65
+ ):
66
+ self.t5_config = self._init_t5_config(t5_config)
67
+ self.d_model = d_model
68
+ self.seq_len = seq_len
69
+ self.patch_len = patch_len
70
+ self.patch_stride_len = patch_stride_len
71
+ self.dropout = dropout
72
+ self.revin_num_features = revin_num_features
73
+ self.revin_eps = revin_eps
74
+ self.revin_affine = revin_affine
75
+ self.add_positional_embedding = add_positional_embedding
76
+ self.value_embedding_bias = value_embedding_bias
77
+ self.orth_gain = orth_gain
78
+ self.mask_ratio = mask_ratio
79
+ self.freeze_embedder = freeze_embedder
80
+ self.freeze_encoder = freeze_encoder
81
+ self.freeze_head = freeze_head
82
+ self.enable_gradient_checkpointing = enable_gradient_checkpointing
83
+ self.randomly_initialize_backbone = randomly_initialize_backbone
84
+
85
+ self._validation_config()
86
+
87
+ super().__init__(**kwargs)
88
+
89
+ def _init_t5_config(self, config: dict):
90
+ if config is None:
91
+ return DEFAULT_T5_CONFIG
92
+ else:
93
+ # 与えられたconfigでDEFAULT_T5_CONFIGを更新
94
+ updated_config = DEFAULT_T5_CONFIG.copy()
95
+ updated_config.update(config)
96
+ return updated_config
97
+
98
+ def _validation_config(self):
99
+ """
100
+ Validate configuration.
101
+ """
102
+ if self.d_model is None:
103
+ self.d_model = self.t5_config["d_model"]
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 32769,
6
+ "transformers_version": "4.41.2"
7
+ }
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f88677cb60be3cb89350315abb985a642bee90396299aa3d3f82ef772a11147
3
+ size 4792435960
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a3ad6610a957266f58adc61346050dc674ae55fc05a8867ac849dbf38755f50
3
+ size 4832008160
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f10483b67b6c3e56cee90466407722f76321756c46febe5c3ad8c8cc4e3b527
3
+ size 4999813904
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3afbdc43e06689904c6cc49ba6c4a3a29e9f4d92a05d830155c6029b67edea94
3
+ size 4999813920
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3da9ed77da78e03ef944b56ebb4739ad5b81ba6c0fe75ef442a047926a5cd134
3
+ size 4832008200
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7124c572aefe9e4c91da380eab0cae51ec510d01149a3abbbad12d8e7e981cb
3
+ size 4999813920
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13a24dcca20fe1ed97f769c727572f66b82d719331a5b7dcaf7a4a3ab1392719
3
+ size 1007731448
model.safetensors.index.json ADDED
@@ -0,0 +1,525 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 30463555584
4
+ },
5
+ "weight_map": {
6
+ "language_model.lm_head.weight": "model-00007-of-00007.safetensors",
7
+ "language_model.model.embed_tokens.weight": "model-00001-of-00007.safetensors",
8
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
9
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
10
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
11
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
12
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
13
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
14
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
15
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
16
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
17
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
18
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
19
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
20
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
21
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
22
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
23
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
24
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
25
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
26
+ "language_model.model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
27
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
28
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
29
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
30
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
31
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
32
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
33
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
34
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
35
+ "language_model.model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
36
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
37
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
38
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
39
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
40
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
41
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
42
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
44
+ "language_model.model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
45
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
46
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
47
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
48
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
49
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
50
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
51
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
52
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
53
+ "language_model.model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
54
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
55
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
56
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
57
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
58
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
59
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
60
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
61
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
62
+ "language_model.model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
63
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
64
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
65
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
66
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
67
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
68
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
69
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
70
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
71
+ "language_model.model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
72
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
73
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
74
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
75
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
76
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
77
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
78
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
79
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
80
+ "language_model.model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
81
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
82
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
83
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
84
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
85
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
86
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
87
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
88
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
89
+ "language_model.model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
90
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
91
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
92
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
93
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
94
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
95
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
96
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
97
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
98
+ "language_model.model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
99
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
100
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
101
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
102
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
103
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
104
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
105
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
106
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
107
+ "language_model.model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
108
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
109
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
110
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
111
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
112
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
113
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
114
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
116
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
117
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
118
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
119
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
120
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
121
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
122
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
123
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
124
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
125
+ "language_model.model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
126
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
127
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
128
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
129
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
130
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
131
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
132
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
133
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
134
+ "language_model.model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
135
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
136
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
137
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
138
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
139
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
140
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
141
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
142
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
143
+ "language_model.model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
144
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
145
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
146
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
147
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
148
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
149
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
150
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
152
+ "language_model.model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
153
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
154
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
155
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
156
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
157
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
158
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
159
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
160
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
161
+ "language_model.model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
162
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
163
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
164
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
165
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
166
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
167
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
168
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
169
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
170
+ "language_model.model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
171
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
172
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
173
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
174
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
175
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
176
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
177
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
178
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
179
+ "language_model.model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
180
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
181
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
182
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
183
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
184
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
185
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
186
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
187
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
188
+ "language_model.model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
189
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
190
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
191
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
192
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
193
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
194
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
195
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
196
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
197
+ "language_model.model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
198
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
199
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
200
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
201
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
202
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
203
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
204
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
205
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
206
+ "language_model.model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
207
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
208
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
209
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
210
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
211
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
212
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
213
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
214
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
215
+ "language_model.model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
216
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
217
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
218
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
219
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
220
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
221
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
222
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
223
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
224
+ "language_model.model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
225
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
226
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
227
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
228
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
229
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
230
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
231
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
232
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
233
+ "language_model.model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
234
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
235
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
236
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
237
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
238
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
239
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
240
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
241
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
242
+ "language_model.model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
243
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
244
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
245
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
246
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
247
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
248
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
249
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
250
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
251
+ "language_model.model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
252
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
253
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
254
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
255
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
256
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
257
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
258
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
259
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
260
+ "language_model.model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
261
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
262
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
263
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
264
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
265
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
266
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
267
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
268
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
269
+ "language_model.model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
270
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
271
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
272
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
273
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
274
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
275
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
276
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
277
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
278
+ "language_model.model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
279
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
280
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
281
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
282
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
283
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
284
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
285
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
286
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
287
+ "language_model.model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
288
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
289
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
290
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
291
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
292
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
293
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
294
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
295
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
296
+ "language_model.model.norm.weight": "model-00007-of-00007.safetensors",
297
+ "multi_modal_projector.linear_1.bias": "model-00001-of-00007.safetensors",
298
+ "multi_modal_projector.linear_1.weight": "model-00001-of-00007.safetensors",
299
+ "multi_modal_projector.linear_2.bias": "model-00001-of-00007.safetensors",
300
+ "multi_modal_projector.linear_2.weight": "model-00001-of-00007.safetensors",
301
+ "multi_modal_projector.mask_embedding": "model-00001-of-00007.safetensors",
302
+ "time_series_tower.encoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
303
+ "time_series_tower.encoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
304
+ "time_series_tower.encoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
305
+ "time_series_tower.encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00007.safetensors",
306
+ "time_series_tower.encoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
307
+ "time_series_tower.encoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
308
+ "time_series_tower.encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
309
+ "time_series_tower.encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
310
+ "time_series_tower.encoder.block.0.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
311
+ "time_series_tower.encoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
312
+ "time_series_tower.encoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
313
+ "time_series_tower.encoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
314
+ "time_series_tower.encoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
315
+ "time_series_tower.encoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
316
+ "time_series_tower.encoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
317
+ "time_series_tower.encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
318
+ "time_series_tower.encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
319
+ "time_series_tower.encoder.block.1.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
320
+ "time_series_tower.encoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
321
+ "time_series_tower.encoder.block.10.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
322
+ "time_series_tower.encoder.block.10.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
323
+ "time_series_tower.encoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
324
+ "time_series_tower.encoder.block.10.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
325
+ "time_series_tower.encoder.block.10.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
326
+ "time_series_tower.encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
327
+ "time_series_tower.encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
328
+ "time_series_tower.encoder.block.10.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
329
+ "time_series_tower.encoder.block.10.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
330
+ "time_series_tower.encoder.block.11.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
331
+ "time_series_tower.encoder.block.11.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
332
+ "time_series_tower.encoder.block.11.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
333
+ "time_series_tower.encoder.block.11.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
334
+ "time_series_tower.encoder.block.11.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
335
+ "time_series_tower.encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
336
+ "time_series_tower.encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
337
+ "time_series_tower.encoder.block.11.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
338
+ "time_series_tower.encoder.block.11.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
339
+ "time_series_tower.encoder.block.12.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
340
+ "time_series_tower.encoder.block.12.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
341
+ "time_series_tower.encoder.block.12.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
342
+ "time_series_tower.encoder.block.12.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
343
+ "time_series_tower.encoder.block.12.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
344
+ "time_series_tower.encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
345
+ "time_series_tower.encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
346
+ "time_series_tower.encoder.block.12.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
347
+ "time_series_tower.encoder.block.12.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
348
+ "time_series_tower.encoder.block.13.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
349
+ "time_series_tower.encoder.block.13.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
350
+ "time_series_tower.encoder.block.13.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
351
+ "time_series_tower.encoder.block.13.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
352
+ "time_series_tower.encoder.block.13.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
353
+ "time_series_tower.encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
354
+ "time_series_tower.encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
355
+ "time_series_tower.encoder.block.13.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
356
+ "time_series_tower.encoder.block.13.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
357
+ "time_series_tower.encoder.block.14.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
358
+ "time_series_tower.encoder.block.14.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
359
+ "time_series_tower.encoder.block.14.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
360
+ "time_series_tower.encoder.block.14.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
361
+ "time_series_tower.encoder.block.14.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
362
+ "time_series_tower.encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
363
+ "time_series_tower.encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
364
+ "time_series_tower.encoder.block.14.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
365
+ "time_series_tower.encoder.block.14.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
366
+ "time_series_tower.encoder.block.15.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
367
+ "time_series_tower.encoder.block.15.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
368
+ "time_series_tower.encoder.block.15.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
369
+ "time_series_tower.encoder.block.15.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
370
+ "time_series_tower.encoder.block.15.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
371
+ "time_series_tower.encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
372
+ "time_series_tower.encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
373
+ "time_series_tower.encoder.block.15.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
374
+ "time_series_tower.encoder.block.15.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
375
+ "time_series_tower.encoder.block.16.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
376
+ "time_series_tower.encoder.block.16.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
377
+ "time_series_tower.encoder.block.16.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
378
+ "time_series_tower.encoder.block.16.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
379
+ "time_series_tower.encoder.block.16.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
380
+ "time_series_tower.encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
381
+ "time_series_tower.encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
382
+ "time_series_tower.encoder.block.16.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
383
+ "time_series_tower.encoder.block.16.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
384
+ "time_series_tower.encoder.block.17.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
385
+ "time_series_tower.encoder.block.17.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
386
+ "time_series_tower.encoder.block.17.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
387
+ "time_series_tower.encoder.block.17.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
388
+ "time_series_tower.encoder.block.17.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
389
+ "time_series_tower.encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
390
+ "time_series_tower.encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
391
+ "time_series_tower.encoder.block.17.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
392
+ "time_series_tower.encoder.block.17.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
393
+ "time_series_tower.encoder.block.18.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
394
+ "time_series_tower.encoder.block.18.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
395
+ "time_series_tower.encoder.block.18.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
396
+ "time_series_tower.encoder.block.18.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
397
+ "time_series_tower.encoder.block.18.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
398
+ "time_series_tower.encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
399
+ "time_series_tower.encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
400
+ "time_series_tower.encoder.block.18.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
401
+ "time_series_tower.encoder.block.18.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
402
+ "time_series_tower.encoder.block.19.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
403
+ "time_series_tower.encoder.block.19.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
404
+ "time_series_tower.encoder.block.19.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
405
+ "time_series_tower.encoder.block.19.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
406
+ "time_series_tower.encoder.block.19.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
407
+ "time_series_tower.encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
408
+ "time_series_tower.encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
409
+ "time_series_tower.encoder.block.19.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
410
+ "time_series_tower.encoder.block.19.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
411
+ "time_series_tower.encoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
412
+ "time_series_tower.encoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
413
+ "time_series_tower.encoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
414
+ "time_series_tower.encoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
415
+ "time_series_tower.encoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
416
+ "time_series_tower.encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
417
+ "time_series_tower.encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
418
+ "time_series_tower.encoder.block.2.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
419
+ "time_series_tower.encoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
420
+ "time_series_tower.encoder.block.20.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
421
+ "time_series_tower.encoder.block.20.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
422
+ "time_series_tower.encoder.block.20.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
423
+ "time_series_tower.encoder.block.20.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
424
+ "time_series_tower.encoder.block.20.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
425
+ "time_series_tower.encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
426
+ "time_series_tower.encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
427
+ "time_series_tower.encoder.block.20.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
428
+ "time_series_tower.encoder.block.20.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
429
+ "time_series_tower.encoder.block.21.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
430
+ "time_series_tower.encoder.block.21.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
431
+ "time_series_tower.encoder.block.21.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
432
+ "time_series_tower.encoder.block.21.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
433
+ "time_series_tower.encoder.block.21.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
434
+ "time_series_tower.encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
435
+ "time_series_tower.encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
436
+ "time_series_tower.encoder.block.21.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
437
+ "time_series_tower.encoder.block.21.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
438
+ "time_series_tower.encoder.block.22.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
439
+ "time_series_tower.encoder.block.22.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
440
+ "time_series_tower.encoder.block.22.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
441
+ "time_series_tower.encoder.block.22.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
442
+ "time_series_tower.encoder.block.22.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
443
+ "time_series_tower.encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
444
+ "time_series_tower.encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
445
+ "time_series_tower.encoder.block.22.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
446
+ "time_series_tower.encoder.block.22.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
447
+ "time_series_tower.encoder.block.23.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
448
+ "time_series_tower.encoder.block.23.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
449
+ "time_series_tower.encoder.block.23.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
450
+ "time_series_tower.encoder.block.23.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
451
+ "time_series_tower.encoder.block.23.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
452
+ "time_series_tower.encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
453
+ "time_series_tower.encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
454
+ "time_series_tower.encoder.block.23.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
455
+ "time_series_tower.encoder.block.23.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
456
+ "time_series_tower.encoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
457
+ "time_series_tower.encoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
458
+ "time_series_tower.encoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
459
+ "time_series_tower.encoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
460
+ "time_series_tower.encoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
461
+ "time_series_tower.encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
462
+ "time_series_tower.encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
463
+ "time_series_tower.encoder.block.3.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
464
+ "time_series_tower.encoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
465
+ "time_series_tower.encoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
466
+ "time_series_tower.encoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
467
+ "time_series_tower.encoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
468
+ "time_series_tower.encoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
469
+ "time_series_tower.encoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
470
+ "time_series_tower.encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
471
+ "time_series_tower.encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
472
+ "time_series_tower.encoder.block.4.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
473
+ "time_series_tower.encoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
474
+ "time_series_tower.encoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
475
+ "time_series_tower.encoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
476
+ "time_series_tower.encoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
477
+ "time_series_tower.encoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
478
+ "time_series_tower.encoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
479
+ "time_series_tower.encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
480
+ "time_series_tower.encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
481
+ "time_series_tower.encoder.block.5.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
482
+ "time_series_tower.encoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
483
+ "time_series_tower.encoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
484
+ "time_series_tower.encoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
485
+ "time_series_tower.encoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
486
+ "time_series_tower.encoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
487
+ "time_series_tower.encoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
488
+ "time_series_tower.encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
489
+ "time_series_tower.encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
490
+ "time_series_tower.encoder.block.6.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
491
+ "time_series_tower.encoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
492
+ "time_series_tower.encoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
493
+ "time_series_tower.encoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
494
+ "time_series_tower.encoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
495
+ "time_series_tower.encoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
496
+ "time_series_tower.encoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
497
+ "time_series_tower.encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
498
+ "time_series_tower.encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
499
+ "time_series_tower.encoder.block.7.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
500
+ "time_series_tower.encoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
501
+ "time_series_tower.encoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
502
+ "time_series_tower.encoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
503
+ "time_series_tower.encoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
504
+ "time_series_tower.encoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
505
+ "time_series_tower.encoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
506
+ "time_series_tower.encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
507
+ "time_series_tower.encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
508
+ "time_series_tower.encoder.block.8.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
509
+ "time_series_tower.encoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
510
+ "time_series_tower.encoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00007.safetensors",
511
+ "time_series_tower.encoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00007.safetensors",
512
+ "time_series_tower.encoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00007.safetensors",
513
+ "time_series_tower.encoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00007.safetensors",
514
+ "time_series_tower.encoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00007.safetensors",
515
+ "time_series_tower.encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00007.safetensors",
516
+ "time_series_tower.encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00007.safetensors",
517
+ "time_series_tower.encoder.block.9.layer.1.DenseReluDense.wo.weight": "model-00001-of-00007.safetensors",
518
+ "time_series_tower.encoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00007.safetensors",
519
+ "time_series_tower.encoder.embed_tokens.weight": "model-00001-of-00007.safetensors",
520
+ "time_series_tower.encoder.final_layer_norm.weight": "model-00001-of-00007.safetensors",
521
+ "time_series_tower.patch_embedding.mask_embedding": "model-00001-of-00007.safetensors",
522
+ "time_series_tower.patch_embedding.position_embedding.pe": "model-00001-of-00007.safetensors",
523
+ "time_series_tower.patch_embedding.value_embedding.weight": "model-00001-of-00007.safetensors"
524
+ }
525
+ }
modeling_mists.py ADDED
@@ -0,0 +1,403 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dataclasses import dataclass
2
+ from typing import List, Optional, Tuple, Union
3
+
4
+ import torch
5
+ import torch.utils.checkpoint
6
+ from torch import nn
7
+
8
+ from transformers import PreTrainedModel
9
+ from transformers.activations import ACT2FN
10
+ from transformers import Cache
11
+ from transformers.modeling_outputs import ModelOutput
12
+ from transformers.utils import (
13
+ add_start_docstrings,
14
+ add_start_docstrings_to_model_forward,
15
+ logging,
16
+ replace_return_docstrings,
17
+ )
18
+ from transformers import AutoModel, AutoModelForCausalLM
19
+
20
+ from .modeling_moment import MomentEmbeddingModel
21
+ from .configuration_mists import MistsConfig
22
+
23
+
24
+ @dataclass
25
+ # Copied from transformers.models.idefics.modeling_idefics.IdeficsCausalLMOutputWithPast with Idefics->Mists
26
+ class MistsCausalLMOutputWithPast(ModelOutput):
27
+ loss: Optional[torch.FloatTensor] = None
28
+ logits: torch.FloatTensor = None
29
+ past_key_values: Optional[List[torch.FloatTensor]] = None
30
+ hidden_states: Optional[Tuple[torch.FloatTensor]] = None
31
+ attentions: Optional[Tuple[torch.FloatTensor]] = None
32
+ time_series_hidden_states: Optional[Tuple[torch.FloatTensor]] = None
33
+
34
+
35
+ class MistsMultiModalProjector(nn.Module):
36
+ def __init__(self, config: MistsConfig):
37
+ super().__init__()
38
+
39
+ # time series towerからのoutputは定型でない。input_maskに合わせてpadding用の学習可能なベクトルを使用し、time series towerからの入力を定型にする。
40
+ self.mask_embedding = nn.Parameter(torch.randn(1, 1, config.time_series_hidden_size))
41
+
42
+ # mlp
43
+ self.linear_1 = nn.Linear(config.time_series_hidden_size, config.text_config.hidden_size, bias=True)
44
+ self.act = ACT2FN[config.projector_hidden_act]
45
+ self.linear_2 = nn.Linear(config.text_config.hidden_size, config.text_config.hidden_size, bias=True)
46
+
47
+ def forward(self, time_series_features, input_mask):
48
+ masked_features = time_series_features * input_mask.unsqueeze(-1) + self.mask_embedding * (1 - input_mask.unsqueeze(-1))
49
+ hidden_states = self.linear_1(masked_features)
50
+ hidden_states = self.act(hidden_states)
51
+ hidden_states = self.linear_2(hidden_states)
52
+ return hidden_states
53
+
54
+
55
+ class MistsPreTrainedModel(PreTrainedModel):
56
+ config_class = MistsConfig
57
+ base_model_prefix = "model"
58
+ supports_gradient_checkpointing = True
59
+ _no_split_modules = ["T5Block"]
60
+ _skip_keys_device_placement = "past_key_values"
61
+ _supports_flash_attn_2 = True
62
+ _supports_sdpa = True
63
+ _supports_cache_class = True
64
+ _supports_static_cache = True
65
+
66
+ def _init_weights(self, module):
67
+ # important: 現状Mistralの初期化コードをそのまま移植している。
68
+ # refers: https://github.com/huggingface/transformers/blob/25245ec26dc29bcf6102e1b4ddd0dfd02e720cf5/src/transformers/models/mistral/modeling_mistral.py#L762
69
+ # 現状のまま事前学習を行うのは望ましくなく、FineTuningと推論のみが可能。
70
+ std = self.config.text_config.initializer_range
71
+ if isinstance(module, nn.Linear):
72
+ module.weight.data.normal_(mean=0.0, std=std)
73
+ if module.bias is not None:
74
+ module.bias.data.zero_()
75
+ elif isinstance(module, nn.Embedding):
76
+ module.weight.data.normal_(mean=0.0, std=std)
77
+ if module.padding_idx is not None:
78
+ module.weight.data[module.padding_idx].zero_()
79
+
80
+
81
+ class MistsForConditionalGeneration(MistsPreTrainedModel):
82
+ def __init__(self, config: MistsConfig):
83
+ super().__init__(config)
84
+
85
+ self.time_series_tower = MomentEmbeddingModel(config.time_series_config)
86
+ self.multi_modal_projector = MistsMultiModalProjector(config)
87
+ self.vocab_size = config.text_config.vocab_size
88
+ self.language_model = AutoModelForCausalLM.from_config(
89
+ config.text_config, attn_implementation=config._attn_implementation
90
+ )
91
+ self.pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else -1
92
+ self.post_init()
93
+
94
+ def get_time_series_tower(self):
95
+ time_series_tower = getattr(self, 'time_series_tower', None)
96
+ if type(time_series_tower) is list:
97
+ time_series_tower = time_series_tower[0]
98
+ return time_series_tower
99
+
100
+ def get_input_embeddings(self):
101
+ return self.language_model.get_input_embeddings()
102
+
103
+ def set_input_embeddings(self, value):
104
+ self.language_model.set_input_embeddings(value)
105
+
106
+ def get_output_embeddings(self):
107
+ return self.language_model.get_output_embeddings()
108
+
109
+ def set_output_embeddings(self, new_embeddings):
110
+ self.language_model.set_output_embeddings(new_embeddings)
111
+
112
+ def set_decoder(self, decoder):
113
+ self.language_model.set_decoder(decoder)
114
+
115
+ def get_decoder(self):
116
+ return self.language_model.get_decoder()
117
+
118
+ def tie_weights(self):
119
+ return self.language_model.tie_weights()
120
+
121
+ def resize_token_embeddings(self, new_num_tokens: Optional[int] = None, pad_to_multiple_of=None) -> nn.Embedding:
122
+ model_embeds = self.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
123
+ # update vocab size
124
+ self.config.text_config.vocab_size = model_embeds.num_embeddings
125
+ self.vocab_size = model_embeds.num_embeddings
126
+ return model_embeds
127
+
128
+ # copy _merge_input_ids_with_image_features from LlabaForConditionalGeneration
129
+ # refers: https://github.com/huggingface/transformers/blob/25245ec26dc29bcf6102e1b4ddd0dfd02e720cf5/src/transformers/models/llava/modeling_llava.py#L277C9-L277C45
130
+ def _merge_input_ids_with_time_series_features(self, time_series_features, inputs_embeds, input_ids, attention_mask, labels):
131
+ num_time_series, num_time_series_patches, embed_dim = time_series_features.shape # num_time_series_patches = n_channels x n_patches
132
+ batch_size, sequence_length = input_ids.shape
133
+ left_padding = not torch.sum(input_ids[:, -1] == torch.tensor(self.pad_token_id))
134
+ # 1. Create a mask to know where special time_series tokens are
135
+ special_time_series_token_mask = input_ids == self.config.time_series_token_index
136
+ num_special_time_series_tokens = torch.sum(special_time_series_token_mask, dim=-1)
137
+ # Compute the maximum embed dimension
138
+ max_embed_dim = (num_special_time_series_tokens.max() * (num_time_series_patches - 1)) + sequence_length
139
+ max_embed_dim = int(max_embed_dim.item()) # テンソルから整数値を取得
140
+ if max_embed_dim is None:
141
+ print(f"num_special_time_series_tokens.max(): {num_special_time_series_tokens.max()}")
142
+ print(f"num_time_series_patches: {num_time_series_patches}")
143
+ print(f"sequence_length: {sequence_length}")
144
+ else:
145
+ print(f"max_embed_dim 0: {max_embed_dim}")
146
+ batch_indices, non_time_series_indices = torch.where(input_ids != self.config.time_series_token_index)
147
+
148
+ # 2. Compute the positions where text should be written
149
+ # Calculate new positions for text tokens in merged time_series-text sequence.
150
+ # `special_time_series_token_mask` identifies time_series tokens. Each time_series token will be replaced by `nb_text_tokens_per_time_series - 1` text tokens.
151
+ # `torch.cumsum` computes how each time_series token shifts subsequent text token positions.
152
+ # - 1 to adjust for zero-based indexing, as `cumsum` inherently increases indices by one.
153
+ new_token_positions = torch.cumsum((special_time_series_token_mask * (num_time_series_patches - 1) + 1), -1) - 1
154
+ nb_time_series_pad = max_embed_dim - 1 - new_token_positions[:, -1]
155
+ if left_padding:
156
+ new_token_positions += nb_time_series_pad[:, None] # offset for left padding
157
+ text_to_overwrite = new_token_positions[batch_indices, non_time_series_indices]
158
+
159
+ # 3. Create the full embedding, already padded to the maximum position
160
+ final_embedding = torch.zeros(
161
+ batch_size, max_embed_dim, embed_dim, dtype=inputs_embeds.dtype, device=inputs_embeds.device
162
+ )
163
+ final_attention_mask = torch.zeros(
164
+ batch_size, max_embed_dim, dtype=attention_mask.dtype, device=inputs_embeds.device
165
+ )
166
+ if labels is not None:
167
+ final_labels = torch.full(
168
+ (batch_size, max_embed_dim), self.config.ignore_index, dtype=input_ids.dtype, device=input_ids.device
169
+ )
170
+ # In case the Vision model or the Language model has been offloaded to CPU, we need to manually
171
+ # set the corresponding tensors into their correct target device.
172
+ target_device = inputs_embeds.device
173
+ batch_indices, non_time_series_indices, text_to_overwrite = (
174
+ batch_indices.to(target_device),
175
+ non_time_series_indices.to(target_device),
176
+ text_to_overwrite.to(target_device),
177
+ )
178
+ attention_mask = attention_mask.to(target_device)
179
+
180
+ # 4. Fill the embeddings based on the mask. If we have ["hey" "<time_series>", "how", "are"]
181
+ # we need to index copy on [0, 577, 578, 579] for the text and [1:576] for the time_series features
182
+ final_embedding[batch_indices, text_to_overwrite] = inputs_embeds[batch_indices, non_time_series_indices]
183
+ final_attention_mask[batch_indices, text_to_overwrite] = attention_mask[batch_indices, non_time_series_indices]
184
+ print("max_embed_dim is None: ", (max_embed_dim is None))
185
+ print("max_embed_dim: ", max_embed_dim)
186
+ if labels is not None:
187
+ final_labels[batch_indices, text_to_overwrite] = labels[batch_indices, non_time_series_indices]
188
+ print("max_embed_dim is None: ", (max_embed_dim is None))
189
+ print("max_embed_dim: ", max_embed_dim)
190
+
191
+ # 5. Fill the embeddings corresponding to the time_series. Anything that is not `text_positions` needs filling (#29835)
192
+ print("inputs_embeds.device: ", inputs_embeds.device)
193
+ print("max_embed_dim: ", max_embed_dim, " is None: ", (max_embed_dim is None))
194
+ time_series_to_overwrite = torch.full(
195
+ (batch_size, max_embed_dim), True, dtype=torch.bool, device=inputs_embeds.device
196
+ )
197
+ time_series_to_overwrite[batch_indices, text_to_overwrite] = False
198
+ time_series_to_overwrite &= time_series_to_overwrite.cumsum(-1) - 1 >= nb_time_series_pad[:, None].to(target_device)
199
+
200
+ if time_series_to_overwrite.sum() != time_series_features.shape[:-1].numel():
201
+ raise ValueError(
202
+ f"The input provided to the model are wrong. The number of time series tokens is {torch.sum(special_time_series_token_mask)} while"
203
+ f" the number of time series given to the model is {num_time_series}. This prevents correct indexing and breaks batch generation."
204
+ )
205
+
206
+ final_embedding[time_series_to_overwrite] = time_series_features.contiguous().reshape(-1, embed_dim).to(target_device)
207
+ final_attention_mask |= time_series_to_overwrite
208
+ position_ids = (final_attention_mask.cumsum(-1) - 1).masked_fill_((final_attention_mask == 0), 1)
209
+
210
+ # 6. Mask out the embedding at padding positions, as we later use the past_key_value value to determine the non-attended tokens.
211
+ batch_indices, pad_indices = torch.where(input_ids == self.pad_token_id)
212
+ indices_to_mask = new_token_positions[batch_indices, pad_indices]
213
+
214
+ final_embedding[batch_indices, indices_to_mask] = 0
215
+
216
+ if labels is None:
217
+ final_labels = None
218
+
219
+ return final_embedding, final_attention_mask, final_labels, position_ids
220
+
221
+ def forward(
222
+ self,
223
+ input_ids: torch.LongTensor = None,
224
+ time_series_values: torch.FloatTensor = None,
225
+ time_series_input_mask: torch.FloatTensor = None,
226
+ attention_mask: Optional[torch.Tensor] = None,
227
+ position_ids: Optional[torch.LongTensor] = None,
228
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
229
+ inputs_embeds: Optional[torch.FloatTensor] = None,
230
+ # time_series_feature_layer: Optional[int] = None,
231
+ # time_series_feature_select_strategy: Optional[str] = None,
232
+ labels: Optional[torch.LongTensor] = None,
233
+ use_cache: Optional[bool] = None,
234
+ output_attentions: Optional[bool] = None,
235
+ output_hidden_states: Optional[bool] = None,
236
+ return_dict: Optional[bool] = None,
237
+ ) -> Union[Tuple, MistsCausalLMOutputWithPast]:
238
+
239
+ # language_modelの引数で変わる
240
+ # output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
241
+ # output_hidden_states = (
242
+ # output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
243
+ # )
244
+ # return_dict = return_dict if return_dict is not None else self.config.use_return_dict
245
+ # vision_feature_layer = (
246
+ # vision_feature_layer if vision_feature_layer is not None else self.config.vision_feature_layer
247
+ # )
248
+ # vision_feature_select_strategy = (
249
+ # vision_feature_select_strategy
250
+ # if vision_feature_select_strategy is not None
251
+ # else self.config.vision_feature_select_strategy
252
+ # )
253
+
254
+ if inputs_embeds is None:
255
+ # 1. Extra the input embeddings
256
+ inputs_embeds = self.get_input_embeddings()(input_ids)
257
+
258
+ # 2. Merge text and time_series
259
+ if time_series_values is not None and input_ids.shape[1] != 1:
260
+ time_series_outputs = self.time_series_tower(time_series_values, time_series_input_mask)
261
+ time_series_features = self.multi_modal_projector(
262
+ time_series_features=time_series_outputs.hidden_states, # [batch_size, n_patches, d_model]
263
+ input_mask=time_series_outputs.input_mask_patch_view, # [batch_size, n_paches]
264
+ )
265
+
266
+ inputs_embeds = inputs_embeds.to(time_series_features.dtype)
267
+ inputs_embeds, attention_mask, labels, position_ids =self._merge_input_ids_with_time_series_features(
268
+ time_series_features, inputs_embeds, input_ids, attention_mask, labels
269
+ )
270
+
271
+ # In case input_ids.shape[1] == 1 & time_series_values==None & past_key_values != None, we are in the case of
272
+ # generation with cache
273
+ elif past_key_values is not None and time_series_values is not None and input_ids.shape[1] == 1:
274
+ # Retrieve the first layer to inspect the logits and mask out the hidden states
275
+ # that are set to 0
276
+ first_layer_past_key_value = past_key_values[0][0][:, :, :, 0]
277
+
278
+ # Sum all dimensions of head_dim (-2) to avoid random errors such as: https://github.com/huggingface/transformers/pull/28032#issuecomment-1863691941
279
+ batch_index, non_attended_tokens = torch.where(first_layer_past_key_value.float().sum(-2) == 0)
280
+
281
+ # Get the target length
282
+ target_length = input_ids.shape[1]
283
+ past_length = first_layer_past_key_value.shape[-1]
284
+
285
+ extended_attention_mask = torch.ones(
286
+ (attention_mask.shape[0], past_length),
287
+ dtype=attention_mask.dtype,
288
+ device=attention_mask.device,
289
+ )
290
+
291
+ # Filter out only the tokens that can be un-attended, this can happen
292
+ # if one uses Llava + Fused modules where the cache on the
293
+ # first iteration is already big enough, or if one passes custom cache
294
+ valid_indices = non_attended_tokens < extended_attention_mask.size(-1)
295
+ new_batch_index = batch_index[valid_indices]
296
+ new_non_attended_tokens = non_attended_tokens[valid_indices]
297
+
298
+ # Zero-out the places where we don't need to attend
299
+ extended_attention_mask[new_batch_index, new_non_attended_tokens] = 0
300
+
301
+ attention_mask = torch.cat((extended_attention_mask, attention_mask[:, -target_length:]), dim=1)
302
+ position_ids = torch.sum(attention_mask, dim=1).unsqueeze(-1) - 1
303
+
304
+ outputs = self.language_model(
305
+ attention_mask=attention_mask,
306
+ position_ids=position_ids,
307
+ past_key_values=past_key_values,
308
+ inputs_embeds=inputs_embeds.to(self.language_model.dtype),
309
+ use_cache=use_cache,
310
+ output_attentions=output_attentions,
311
+ output_hidden_states=output_hidden_states,
312
+ return_dict=return_dict,
313
+ )
314
+
315
+ logits = outputs[0]
316
+
317
+ loss = None
318
+ if labels is not None:
319
+ # Shift so that tokens < n predict n
320
+ if attention_mask is not None:
321
+ shift_attention_mask = attention_mask[..., 1:]
322
+ shift_logits = logits[..., :-1, :][shift_attention_mask.to(logits.device) != 0].contiguous()
323
+ shift_labels = labels[..., 1:][shift_attention_mask.to(labels.device) != 0].contiguous()
324
+ else:
325
+ shift_logits = logits[..., :-1, :].contiguous()
326
+ shift_labels = labels[..., 1:].contiguous()
327
+ # Flatten the tokens
328
+ loss_fct = nn.CrossEntropyLoss()
329
+ loss = loss_fct(
330
+ shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1).to(shift_logits.device)
331
+ )
332
+
333
+ if not return_dict:
334
+ output = (logits,) + outputs[1:]
335
+ return (loss,) + output if loss is not None else output
336
+
337
+ return MistsCausalLMOutputWithPast(
338
+ loss=loss,
339
+ logits=logits,
340
+ past_key_values=outputs.past_key_values,
341
+ hidden_states=outputs.hidden_states,
342
+ attentions=outputs.attentions,
343
+ )
344
+
345
+ def prepare_inputs_for_generation(
346
+ self, input_ids, past_key_values=None, inputs_embeds=None, time_series_values=None, attention_mask=None, **kwargs
347
+ ):
348
+ if past_key_values is not None:
349
+ if isinstance(past_key_values, Cache):
350
+ cache_length = past_key_values.get_seq_length()
351
+ past_length = past_key_values.seen_tokens
352
+ else:
353
+ cache_length = past_length = past_key_values[0][0].shape[2]
354
+
355
+ # Keep only the unprocessed tokens:
356
+ # 1 - If the length of the attention_mask exceeds the length of input_ids, then we are in a setting where
357
+ # some of the inputs are exclusively passed as part of the cache (e.g. when passing input_embeds as
358
+ # input)
359
+ if attention_mask is not None and attention_mask.shape[1] > input_ids.shape[1]:
360
+ input_ids = input_ids[:, -(attention_mask.shape[1] - past_length) :]
361
+ # 2 - If the past_length is smaller than input_ids', then input_ids holds all input tokens. We can discard
362
+ # input_ids based on the past_length.
363
+ elif past_length < input_ids.shape[1]:
364
+ input_ids = input_ids[:, past_length:]
365
+ # 3 - Otherwise (past_length >= input_ids.shape[1]), let's assume input_ids only has unprocessed tokens.
366
+ elif self.config.time_series_token_index in input_ids:
367
+ input_ids = input_ids[:, input_ids.shape[1] - 1 :]
368
+ # If the cache has seen more tokens than it can hold, then the cache has a size limit. Let's discard the
369
+ # older attention values, as their corresponding values are not part of the input.
370
+ if cache_length < past_length and attention_mask is not None:
371
+ attention_mask = attention_mask[:, -(cache_length + input_ids.shape[1]) :]
372
+
373
+ position_ids = kwargs.get("position_ids", None)
374
+ if attention_mask is not None and position_ids is None:
375
+ # create position_ids on the fly for batch generation
376
+ position_ids = attention_mask.long().cumsum(-1) - 1
377
+ position_ids.masked_fill_(attention_mask == 0, 1)
378
+ if past_key_values:
379
+ position_ids = position_ids[:, -input_ids.shape[1] :]
380
+
381
+ # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
382
+ if inputs_embeds is not None and past_key_values is None:
383
+ model_inputs = {"inputs_embeds": inputs_embeds}
384
+ else:
385
+ model_inputs = {"input_ids": input_ids}
386
+
387
+ model_inputs.update(
388
+ {
389
+ "position_ids": position_ids,
390
+ "past_key_values": past_key_values,
391
+ "use_cache": kwargs.get("use_cache"),
392
+ "attention_mask": attention_mask,
393
+ "time_series_values": time_series_values,
394
+ }
395
+ )
396
+ return model_inputs
397
+
398
+ def _reorder_cache(self, *args, **kwargs):
399
+ return self.language_model._reorder_cache(*args, **kwargs)
400
+
401
+
402
+
403
+
modeling_moment.py ADDED
@@ -0,0 +1,533 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Auton LabによるMomentライブラリをTransformers向けに書き換えたものです。
2
+ # Embeddingに特化したアーキテクチャとなっています。
3
+ # refers: https://github.com/moment-timeseries-foundation-model/moment
4
+
5
+ from dataclasses import dataclass
6
+ from typing import List, Optional, Tuple, Union
7
+
8
+ import math
9
+ import numpy.typing as npt
10
+ import torch
11
+ from torch import nn
12
+
13
+ from transformers import PreTrainedModel
14
+ from transformers import T5Config, T5Model
15
+ from transformers.utils import logging
16
+
17
+ from .configuration_moment import MomentConfig
18
+
19
+ logger = logging.get_logger(__name__)
20
+
21
+ @dataclass
22
+ class TimeseriesOutputs:
23
+ # forecast: npt.NDArray = None
24
+ # anomaly_scores: npt.NDArray = None
25
+ logits: npt.NDArray = None
26
+ labels: int = None
27
+ input_mask: npt.NDArray = None
28
+ pretrain_mask: npt.NDArray = None
29
+ # reconstruction: npt.NDArray = None
30
+ embeddings: npt.NDArray = None
31
+ metadata: dict = None
32
+ illegal_output: bool = False
33
+ hidden_states: npt.NDArray = None # For Mists model
34
+ input_mask_patch_view: npt.NDArray = None # For Mists model
35
+
36
+
37
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/utils/masking.py#L6C1-L6C2
38
+ class Masking:
39
+ def __init__(
40
+ self, mask_ratio: float = 0.3, patch_len: int = 8, stride: Optional[int] = None
41
+ ):
42
+ """
43
+ Indices with 0 mask are hidden, and with 1 are observed.
44
+ """
45
+ self.mask_ratio = mask_ratio
46
+ self.patch_len = patch_len
47
+ self.stride = patch_len if stride is None else stride
48
+
49
+ @staticmethod
50
+ def convert_seq_to_patch_view(
51
+ mask: torch.Tensor, patch_len: int = 8, stride: Optional[int] = None
52
+ ):
53
+ """
54
+ Input:
55
+ mask : torch.Tensor of shape [batch_size x seq_len]
56
+ Output
57
+ mask : torch.Tensor of shape [batch_size x n_patches]
58
+ """
59
+ stride = patch_len if stride is None else stride
60
+ mask = mask.unfold(dimension=-1, size=patch_len, step=stride)
61
+ # mask : [batch_size x n_patches x patch_len]
62
+ return (mask.sum(dim=-1) == patch_len).long()
63
+
64
+ @staticmethod
65
+ def convert_patch_to_seq_view(
66
+ mask: torch.Tensor,
67
+ patch_len: int = 8,
68
+ ):
69
+ """
70
+ Input:
71
+ mask : torch.Tensor of shape [batch_size x n_patches]
72
+ Output:
73
+ mask : torch.Tensor of shape [batch_size x seq_len]
74
+ """
75
+ return mask.repeat_interleave(patch_len, dim=-1)
76
+
77
+ def generate_mask(self, x: torch.Tensor, input_mask: Optional[torch.Tensor] = None):
78
+ """
79
+ Input:
80
+ x : torch.Tensor of shape
81
+ [batch_size x n_channels x n_patches x patch_len] or
82
+ [batch_size x n_channels x seq_len]
83
+ input_mask: torch.Tensor of shape [batch_size x seq_len] or
84
+ [batch_size x n_patches]
85
+ Output:
86
+ mask : torch.Tensor of shape [batch_size x seq_len]
87
+ """
88
+ if x.ndim == 4:
89
+ return self._mask_patch_view(x, input_mask=input_mask)
90
+ elif x.ndim == 3:
91
+ return self._mask_seq_view(x, input_mask=input_mask)
92
+
93
+ def _mask_patch_view(self, x, input_mask=None):
94
+ """
95
+ Input:
96
+ x : torch.Tensor of shape
97
+ [batch_size x n_channels x n_patches x patch_len]
98
+ input_mask: torch.Tensor of shape [batch_size x seq_len]
99
+ Output:
100
+ mask : torch.Tensor of shape [batch_size x n_patches]
101
+ """
102
+ input_mask = self.convert_seq_to_patch_view(
103
+ input_mask, self.patch_len, self.stride
104
+ )
105
+ n_observed_patches = input_mask.sum(dim=-1, keepdim=True) # batch_size x 1
106
+
107
+ batch_size, _, n_patches, _ = x.shape
108
+ len_keep = torch.ceil(n_observed_patches * (1 - self.mask_ratio)).long()
109
+ noise = torch.rand(
110
+ batch_size, n_patches, device=x.device
111
+ ) # noise in [0, 1], batch_size x n_channels x n_patches
112
+ noise = torch.where(
113
+ input_mask == 1, noise, torch.ones_like(noise)
114
+ ) # only keep the noise of observed patches
115
+
116
+ # Sort noise for each sample
117
+ ids_shuffle = torch.argsort(
118
+ noise, dim=1
119
+ ) # Ascend: small is keep, large is remove
120
+ ids_restore = torch.argsort(
121
+ ids_shuffle, dim=1
122
+ ) # ids_restore: [batch_size x n_patches]
123
+
124
+ # Generate the binary mask: 0 is keep, 1 is remove
125
+ mask = torch.zeros(
126
+ [batch_size, n_patches], device=x.device
127
+ ) # mask: [batch_size x n_patches]
128
+ for i in range(batch_size):
129
+ mask[i, : len_keep[i]] = 1
130
+
131
+ # Unshuffle to get the binary mask
132
+ mask = torch.gather(mask, dim=1, index=ids_restore)
133
+
134
+ return mask.long()
135
+
136
+ def _mask_seq_view(self, x, input_mask=None):
137
+ """
138
+ Input:
139
+ x : torch.Tensor of shape
140
+ [batch_size x n_channels x seq_len]
141
+ input_mask: torch.Tensor of shape [batch_size x seq_len]
142
+ Output:
143
+ mask : torch.Tensor of shape [batch_size x seq_len]
144
+ """
145
+ x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride)
146
+ mask = self._mask_patch_view(x, input_mask=input_mask)
147
+ return self.convert_patch_to_seq_view(mask, self.patch_len).long()
148
+
149
+
150
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/layers/revin.py#L5
151
+ def nanvar(tensor, dim=None, keepdim=False):
152
+ tensor_mean = tensor.nanmean(dim=dim, keepdim=True)
153
+ output = (tensor - tensor_mean).square().nanmean(dim=dim, keepdim=keepdim)
154
+ return output
155
+
156
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/layers/revin.py#L11
157
+ def nanstd(tensor, dim=None, keepdim=False):
158
+ output = nanvar(tensor, dim=dim, keepdim=keepdim)
159
+ output = output.sqrt()
160
+ return output
161
+
162
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/layers/revin.py#L17
163
+ class RevIN(nn.Module):
164
+ def __init__(self, num_features: int, eps: float = 1e-5, affine: bool = False):
165
+ """
166
+ :param num_features: the number of features or channels
167
+ :param eps: a value added for numerical stability
168
+ :param affine: if True, RevIN has learnable affine parameters
169
+ """
170
+ super(RevIN, self).__init__()
171
+ self.num_features = num_features
172
+ self.eps = eps
173
+ self.affine = affine
174
+
175
+ if self.affine:
176
+ self._init_params()
177
+
178
+ def forward(self, x: torch.Tensor, mode: str = "norm", mask: torch.Tensor = None):
179
+ """
180
+ :param x: input tensor of shape (batch_size, n_channels, seq_len)
181
+ :param mode: 'norm' or 'denorm'
182
+ :param mask: input mask of shape (batch_size, seq_len)
183
+ :return: RevIN transformed tensor
184
+ """
185
+ if mode == "norm":
186
+ self._get_statistics(x, mask=mask)
187
+ x = self._normalize(x)
188
+ elif mode == "denorm":
189
+ x = self._denormalize(x)
190
+ else:
191
+ raise NotImplementedError
192
+ return x
193
+
194
+ def _init_params(self):
195
+ # initialize RevIN params: (C,)
196
+ self.affine_weight = nn.Parameter(torch.ones(1, self.num_features, 1))
197
+ self.affine_bias = nn.Parameter(torch.zeros(1, self.num_features, 1))
198
+
199
+ def _get_statistics(self, x, mask=None):
200
+ """
201
+ x : batch_size x n_channels x seq_len
202
+ mask : batch_size x seq_len
203
+ """
204
+ if mask is None:
205
+ mask = torch.ones((x.shape[0], x.shape[-1]))
206
+ n_channels = x.shape[1]
207
+ mask = mask.unsqueeze(1).repeat(1, n_channels, 1).bool()
208
+ # Set masked positions to NaN, and unmasked positions are taken from x
209
+ masked_x = torch.where(mask, x, torch.nan)
210
+ self.mean = torch.nanmean(masked_x, dim=-1, keepdim=True).detach()
211
+ self.stdev = nanstd(masked_x, dim=-1, keepdim=True).detach() + self.eps
212
+ # self.stdev = torch.sqrt(
213
+ # torch.var(masked_x, dim=-1, keepdim=True) + self.eps).get_data().detach()
214
+ # NOTE: By default not bessel correction
215
+
216
+ def _normalize(self, x):
217
+ x = x - self.mean
218
+ x = x / self.stdev
219
+
220
+ if self.affine:
221
+ x = x * self.affine_weight
222
+ x = x + self.affine_bias
223
+ return x
224
+
225
+ def _denormalize(self, x):
226
+ if self.affine:
227
+ x = x - self.affine_bias
228
+ x = x / (self.affine_weight + self.eps * self.eps)
229
+ x = x * self.stdev
230
+ x = x + self.mean
231
+ return x
232
+
233
+
234
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/layers/embed.py#L10
235
+ class PositionalEmbedding(nn.Module):
236
+ def __init__(self, d_model, max_len=5000, model_name="MOMENT"):
237
+ super(PositionalEmbedding, self).__init__()
238
+ self.model_name = model_name
239
+
240
+ # Compute the positional encodings once in log space.
241
+ pe = torch.zeros(max_len, d_model).float()
242
+ pe.require_grad = False
243
+
244
+ position = torch.arange(0, max_len).float().unsqueeze(1)
245
+ div_term = (
246
+ torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)
247
+ ).exp()
248
+
249
+ pe[:, 0::2] = torch.sin(position * div_term)
250
+ pe[:, 1::2] = torch.cos(position * div_term)
251
+
252
+ pe = pe.unsqueeze(0)
253
+ self.register_buffer("pe", pe)
254
+
255
+ def forward(self, x):
256
+ if (
257
+ self.model_name == "MOMENT"
258
+ or self.model_name == "TimesNet"
259
+ or self.model_name == "GPT4TS"
260
+ ):
261
+ return self.pe[:, : x.size(2)]
262
+ else:
263
+ return self.pe[:, : x.size(1)]
264
+
265
+
266
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/layers/embed.py#L181
267
+ class PatchEmbedding(nn.Module):
268
+ def __init__(
269
+ self,
270
+ d_model: int = 768,
271
+ seq_len: int = 512,
272
+ patch_len: int = 8,
273
+ stride: int = 8,
274
+ dropout: int = 0.1,
275
+ add_positional_embedding: bool = False,
276
+ value_embedding_bias: bool = False,
277
+ orth_gain: float = 1.41,
278
+ ):
279
+ super(PatchEmbedding, self).__init__()
280
+ self.patch_len = patch_len
281
+ self.seq_len = seq_len
282
+ self.stride = stride
283
+ self.d_model = d_model
284
+ self.add_positional_embedding = add_positional_embedding
285
+
286
+ self.value_embedding = nn.Linear(patch_len, d_model, bias=value_embedding_bias)
287
+ self.mask_embedding = nn.Parameter(torch.zeros(d_model))
288
+
289
+ if orth_gain is not None:
290
+ torch.nn.init.orthogonal_(self.value_embedding.weight, gain=orth_gain)
291
+ if value_embedding_bias:
292
+ self.value_embedding.bias.data.zero_()
293
+ # torch.nn.init.orthogonal_(self.mask_embedding, gain=orth_gain) # Fails
294
+
295
+ # Positional embedding
296
+ if self.add_positional_embedding:
297
+ self.position_embedding = PositionalEmbedding(d_model)
298
+
299
+ # Residual dropout
300
+ self.dropout = nn.Dropout(dropout)
301
+
302
+ def forward(self, x: torch.Tensor, mask: torch.Tensor = None) -> torch.Tensor:
303
+ mask = Masking.convert_seq_to_patch_view(
304
+ mask, patch_len=self.patch_len
305
+ ).unsqueeze(-1)
306
+ # mask : [batch_size x n_patches x 1]
307
+ n_channels = x.shape[1]
308
+ mask = (
309
+ mask.repeat_interleave(self.d_model, dim=-1)
310
+ .unsqueeze(1)
311
+ .repeat(1, n_channels, 1, 1)
312
+ )
313
+ # mask : [batch_size x n_channels x n_patches x d_model]
314
+
315
+ # Input encoding
316
+ x = mask * self.value_embedding(x) + (1 - mask) * self.mask_embedding
317
+ if self.add_positional_embedding:
318
+ x = x + self.position_embedding(x)
319
+
320
+ return self.dropout(x)
321
+
322
+
323
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/layers/embed.py#L237C1-L251C17
324
+ class Patching(nn.Module):
325
+ def __init__(self, patch_len: int, stride: int):
326
+ super().__init__()
327
+ self.patch_len = patch_len
328
+ self.stride = stride
329
+ if self.stride != self.patch_len:
330
+ logger.warning(
331
+ "Stride and patch length are not equal. "
332
+ "This may lead to unexpected behavior."
333
+ )
334
+
335
+ def forward(self, x):
336
+ x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride)
337
+ # x : [batch_size x n_channels x num_patch x patch_len]
338
+ return x
339
+
340
+
341
+ class MomentPreTrainedModel(PreTrainedModel):
342
+ config_class = MomentConfig
343
+
344
+ base_model_prefix = "model"
345
+ supports_gradient_checkpointing = True
346
+ _no_split_modules = ["T5Block"]
347
+ _skip_keys_device_placement = ""
348
+
349
+ # 本来のT5の_init_weightsはもっと詳細だが、事前学習の予定はないためここでは簡単にしている。
350
+ # refers: https://github.com/huggingface/transformers/blob/517df566f572d90e6301df87870f651f0d1b1110/src/transformers/models/t5/modeling_t5.py#L810
351
+ def _init_weights(self, module):
352
+ std = self.config.t5_config["initializer_factor"]
353
+ if isinstance(module, nn.Linear):
354
+ module.weight.data.normal_(mean=0.0, std=std)
355
+ if module.bias is not None:
356
+ module.bias.data.zero_()
357
+ elif isinstance(module, nn.Embedding):
358
+ module.weight.data.normal_(mean=0.0, std=std)
359
+ if module.padding_idx is not None:
360
+ module.weight.data[module.padding_idx].zero_()
361
+
362
+
363
+ class MomentEmbeddingModel(MomentPreTrainedModel):
364
+ def __init__(self, config):
365
+ super().__init__(config)
366
+ self.config = config
367
+ self.seq_len = config.seq_len
368
+ self.patch_len = config.patch_len
369
+ self.patch_stride_len = config.patch_stride_len
370
+
371
+ # TODO: normalizer, tokenizerはProcessor側に配置するべきか?
372
+ # 現状の考え: 特にMomentから切り離す用途もない。
373
+ #       Processor側では入力の512timestepsへの切り取り等、
374
+ #       input validationとTensorへの切り替えを行うで良さそう。
375
+ self.normalizer = RevIN(
376
+ num_features=getattr(config, "revin_num_features", 1), eps=getattr(config, "revin_eps", 1e-5), affine=getattr(config, "revin_affine", False)
377
+ )
378
+ self.tokenizer = Patching(
379
+ patch_len=config.patch_len, stride=config.patch_stride_len
380
+ )
381
+ # モデル構成
382
+ self.patch_embedding = PatchEmbedding(
383
+ d_model=config.d_model,
384
+ seq_len=config.seq_len,
385
+ patch_len=config.patch_len,
386
+ stride=config.patch_stride_len,
387
+ dropout=getattr(config, "dropout", 0.1),
388
+ add_positional_embedding=getattr(config, "add_positional_embedding", True),
389
+ value_embedding_bias=getattr(config, "value_embedding_bias", False),
390
+ orth_gain=getattr(config, "orth_gain", 1.41),
391
+ )
392
+ self.mask_generator = Masking(mask_ratio=getattr(config, "mask_ratio", 0.0))
393
+ self.encoder = self._get_t5_encoder(config.t5_config, config.enable_gradient_checkpointing)
394
+ self.head = nn.Identity()
395
+
396
+ # Frozen parameters
397
+ self.freeze_embedder = getattr(config, "freeze_embedder", True)
398
+ self.freeze_encoder = getattr(config, "freeze_encoder", True)
399
+ self.freeze_head = getattr(config, "freeze_head", False)
400
+
401
+ if self.freeze_embedder:
402
+ self.patch_embedding = freeze_parameters(self.patch_embedding)
403
+ if self.freeze_encoder:
404
+ self.encoder = freeze_parameters(self.encoder)
405
+ if self.freeze_head:
406
+ self.head = freeze_parameters(self.head)
407
+
408
+ def _get_t5_encoder(self, config: dict, enable_gradient_checkpointing: bool) -> nn.Module:
409
+ # random initialize
410
+ # Momentでは(言語で)事前学習済みのモデルを取得することもできるようになっている
411
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/moment.py#L205
412
+ t5_config = T5Config.from_dict(config)
413
+ t5_model = T5Model(t5_config)
414
+ t5_model_encoder = t5_model.get_encoder()
415
+
416
+ if enable_gradient_checkpointing:
417
+ t5_model_encoder.gradient_checkpointing_enable()
418
+ logger.info("Enabling gradient checkpointing.")
419
+
420
+ return t5_model_encoder
421
+
422
+ def embed(
423
+ self,
424
+ x_enc: torch.Tensor,
425
+ input_mask: torch.Tensor = None,
426
+ reduction: str = "mean",
427
+ **kwargs,
428
+ ) -> TimeseriesOutputs:
429
+ batch_size, n_channels, seq_len = x_enc.shape
430
+
431
+ if input_mask is None:
432
+ input_mask = torch.ones((batch_size, seq_len)).to(x_enc.device)
433
+
434
+ x_enc = self.normalizer(x=x_enc, mask=input_mask, mode="norm")
435
+ x_enc = torch.nan_to_num(x_enc, nan=0, posinf=0, neginf=0)
436
+
437
+ # [batch_size x n_patches]
438
+ input_mask_patch_view = Masking.convert_seq_to_patch_view(
439
+ input_mask, self.patch_len
440
+ )
441
+
442
+ x_enc = self.tokenizer(x=x_enc)
443
+ enc_in = self.patch_embedding(x_enc, mask=input_mask)
444
+
445
+ n_patches = enc_in.shape[2]
446
+ enc_in = enc_in.reshape(
447
+ (batch_size * n_channels, n_patches, self.config.d_model)
448
+ )
449
+
450
+ patch_view_mask = Masking.convert_seq_to_patch_view(input_mask, self.patch_len)
451
+ attention_mask = patch_view_mask.repeat_interleave(n_channels, dim=0)
452
+ outputs = self.encoder(inputs_embeds=enc_in, attention_mask=attention_mask)
453
+ enc_out = outputs.last_hidden_state
454
+ hidden_states = outputs.last_hidden_state # hidden_statesを取得
455
+
456
+ enc_out = enc_out.reshape((-1, n_channels, n_patches, self.config.d_model))
457
+ # [batch_size x n_channels x n_patches x d_model]
458
+
459
+ if reduction == "mean":
460
+ enc_out = enc_out.mean(dim=1, keepdim=False) # Mean across channels
461
+ # [batch_size x n_patches x d_model]
462
+ input_mask_patch_view = input_mask_patch_view.unsqueeze(-1).repeat(
463
+ 1, 1, self.config.d_model
464
+ )
465
+ enc_out = (input_mask_patch_view * enc_out).sum(
466
+ dim=1
467
+ ) / input_mask_patch_view.sum(dim=1)
468
+ else:
469
+ raise NotImplementedError(f"Reduction method {reduction} not implemented.")
470
+
471
+ # For Mists model
472
+ # [batch_size, n_channels x n_patches, d_model]
473
+ # Ensure hidden_states are consistent for both short and long inputs with input_mask specified
474
+ # hidden_states = hidden_states.reshape(batch_size, n_channels, n_patches, self.config.d_model).transpose(1, 2).reshape(batch_size, -1, self.config.d_model)
475
+ # [batch_size x n_patches]
476
+ input_mask_patch_view_for_hidden_states = Masking.convert_seq_to_patch_view(input_mask, self.patch_len)
477
+ # [batch_size x n_channels x n_patches x d_model]
478
+ input_mask_patch_view_for_hidden_states = input_mask_patch_view_for_hidden_states.unsqueeze(1).unsqueeze(-1).repeat(
479
+ 1, n_channels, 1, self.config.d_model
480
+ )
481
+ # [batch_size x n_channels x n_patches x d_model]
482
+ hidden_states = hidden_states.reshape(batch_size, n_channels, n_patches, self.config.d_model)
483
+ hidden_states = input_mask_patch_view_for_hidden_states * hidden_states
484
+ # [batch_size, n_channels x n_patches, d_model]
485
+ hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, self.config.d_model)
486
+
487
+ # [batch_size x n_patches]
488
+ input_mask_patch_view_for_mists = Masking.convert_seq_to_patch_view(input_mask, self.patch_len)
489
+ # [batch_size, n_channels x n_patches]
490
+ input_mask_patch_view_for_mists = input_mask_patch_view_for_mists.repeat_interleave(n_channels, dim=1)
491
+
492
+ return TimeseriesOutputs(
493
+ embeddings=enc_out, input_mask=input_mask, metadata=reduction, hidden_states=hidden_states, input_mask_patch_view=input_mask_patch_view_for_mists
494
+ )
495
+
496
+ def forward(
497
+ self,
498
+ time_series_values: torch.Tensor,
499
+ # mask: torch.Tensor = None,
500
+ input_mask: torch.Tensor = None,
501
+ **kwargs,
502
+ ) -> TimeseriesOutputs:
503
+ if input_mask is None:
504
+ input_mask = torch.ones_like(time_series_values[:, 0, :])
505
+
506
+ return self.embed(x_enc=time_series_values, input_mask=input_mask, **kwargs)
507
+
508
+ def calculate_n_patches(self, seq_len: int) -> int:
509
+ """
510
+ 時系列の長さ(seq_len)を与えて、モデルのself.patch_lenとself.strideを使ってn_patchesを計算して返します。
511
+ strideがNoneの場合はpatch_lenを使用します。
512
+
513
+ Args:
514
+ seq_len (int): 時系列の長さ
515
+
516
+ Returns:
517
+ int: 計算されたn_patchesの数
518
+ """
519
+ stride = self.patch_stride_len if self.patch_stride_len is not None else self.patch_len
520
+ n_patches = (seq_len - self.patch_len) // stride + 1
521
+ return n_patches
522
+
523
+
524
+ # refers: https://github.com/moment-timeseries-foundation-model/moment/blob/088b253a1138ac7e48a7efc9bf902336c9eec8d9/momentfm/models/moment.py#L601
525
+ def freeze_parameters(model):
526
+ """
527
+ Freeze parameters of the model
528
+ """
529
+ # Freeze the parameters
530
+ for name, param in model.named_parameters():
531
+ param.requires_grad = False
532
+
533
+ return model