philipp-zettl
/

t5-small-tydiqa-en

Text2Text Generation

Transformers

Safetensors

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

philipp-zettl commited on Jun 2

Commit

ed99ccc

•

1 Parent(s): 04934f4

Update README.md

Browse files

Files changed (1) hide show

README.md +166 -25

README.md CHANGED Viewed

@@ -1,6 +1,32 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
@@ -17,21 +43,11 @@ tags: []
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
@@ -71,34 +87,159 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation

 ---
 library_name: transformers
+datasets:
+  - google-research-datasets/tydiqa
+license: apache-2.0
+pipeline_tag: text2text-generation
+base_model: google/flan-t5-small
+widget:
+  - text: "question: What is the huggingface hub? context: The Hugging Face Hub is a
+      platform with over 350k models, 75k datasets, and 150k demo apps (Spaces),
+      all open source and publicly available, in an online platform where people
+      can easily collaborate and build ML together. The Hub works as a central
+      place where anyone can explore, experiment, collaborate, and build
+      technology with Machine Learning. Are you ready to join the path towards
+      open source Machine Learning? 🤗"
+    example_title: 🤗 Hub
+  - text: "question: What is huggingface datasets? context: 🤗 Datasets is a library
+      for easily accessing and sharing datasets for Audio, Computer Vision, and
+      Natural Language Processing (NLP) tasks. Load a dataset in a single line
+      of code, and use our powerful data processing methods to quickly get your
+      dataset ready for training in a deep learning model. Backed by the Apache
+      Arrow format, process large datasets with zero-copy reads without any
+      memory constraints for optimal speed and efficiency. We also feature a
+      deep integration with the Hugging Face Hub, allowing you to easily load
+      and share a dataset with the wider machine learning community. Find your
+      dataset today on the Hugging Face Hub, and take an in-depth look inside of
+      it with the live viewer."
+    example_title: 🤗 datasets
 ---
 # Model Card for Model ID
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [philipp-zettl](https://huggingface.co/philipp-zettl)
+- **Model type:** Seq2Seq
+- **Language(s) (NLP):**
+- **License:** Apache 2.0
+- **Finetuned from model:** [google/flan-t5-small](https://huggingface.co/google/flan-t5-small)
 ## Uses
 Use the code below to get started with the model.
+```python
+# Load model directly
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained("philipp-zettl/t5-small-tydiqa-en")
+model = AutoModelForSeq2SeqLM.from_pretrained("philipp-zettl/t5-small-tydiqa-en")
+question = "Some question?"
+# For instance retrieved using similarity search
+context = "A long context ..."
+inputs = [f"question: {q} context: {c}" for q, c in [[question, context]]]
+model_inputs = tokenizer(inputs, max_length=512, padding=True, truncation=True)
+input_ids = torch.tensor(model_inputs['input_ids']).to(device)
+attention_mask = torch.tensor(model_inputs['attention_mask']).to(device)
+with torch.no_grad():
+  sample_output = model.generate(input_ids[:1], max_length=100)
+  sample_output_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
+  input_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)
+print(f"Sample Input", input_text)
+print(f"Sample Output", sample_output_text)
+```
 ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+Trained on the english samples of [google-research-datasets/tydiqa](https://huggingface.co/datasets/google-research-datasets/tydiqa) using following code
+```python
+from datasets import load_dataset
+# Load SQuAD dataset
+squad_dataset = load_dataset('google-research-datasets/tydiqa', 'secondary_task')
+# Split the dataset into training and validation
+train_dataset = squad_dataset['train'].filter(lambda e: any([e['id'].startswith(lang) for lang in ['english']]))
+validation_dataset = squad_dataset['validation'].filter(lambda e: any([e['id'].startswith(lang) for lang in ['english']]))
+```
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing
+Code for preprocessing
+```python
+def preprocess_batch(batch, tokenizer, max_input_length=512, max_output_length=128):
+    questions = batch['question']
+    contexts = batch['context']
+    answers = [answer['text'][0] for answer in batch['answers']]
+    inputs = [f"question: {q} context: {c}" for q, c in zip(questions, contexts)]
+    model_inputs = tokenizer(inputs, max_length=max_input_length, padding=True, truncation=True)
+    labels = tokenizer(answers, max_length=max_output_length, padding=True, truncation=True)
+    model_inputs['labels'] = labels['input_ids']
+    return model_inputs
+# Tokenize the dataset
+train_dataset = train_dataset.map(lambda batch: preprocess_batch(batch, teacher_tokenizer), batched=True)
+validation_dataset = validation_dataset.map(lambda batch: preprocess_batch(batch, teacher_tokenizer), batched=True)
+# Set format for PyTorch
+train_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
+validation_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
+```
+#### Training Hyperparameters
+Code of training loop:
+```python
+from tqdm import tqdm
+from transformers import AdamW, DataCollatorForSeq2Seq
+from torch.utils.data import DataLoader
+from torch.utils.tensorboard import SummaryWriter
+torch.cuda.empty_cache()
+teacher_model.to(device)
+# Training parameters
+epochs = 3
+learning_rate = 5e-5
+temperature = 2.0
+batch_size = 2
+optimizer = torch.optim.AdamW(teacher_model.parameters(), lr=learning_rate)
+# Create a data collator for padding and batching
+data_collator = DataCollatorForSeq2Seq(tokenizer=teacher_tokenizer, model=teacher_model)
+# Create DataLoaders with the data collator
+train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=data_collator)
+validation_dataloader = DataLoader(validation_dataset, batch_size=batch_size, collate_fn=data_collator)
+writer = SummaryWriter('./logs', comment='t5-base')
+print("Starting training...")
+# Training loop
+for epoch in range(epochs):
+    teacher_model.train()
+    total_loss = 0
+    print(f"Epoch {epoch+1}/{epochs}")
+    progress_bar = tqdm(train_dataloader, desc="Training", leave=False)
+    for step, batch in enumerate(progress_bar):
+        # Move student inputs to GPU
+        input_ids = batch['input_ids'].to(device)
+        attention_mask = batch['attention_mask'].to(device)
+        labels = batch['labels'].to(device)
+        # Teacher forward pass on CPU
+        teacher_outputs = teacher_model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
+        teacher_logits = teacher_outputs.logits
+        # Calculate losses
+        loss = teacher_outputs.loss  # Cross-entropy loss
+        writer.add_scalar("Loss/train", loss, step)
+        # Backpropagation
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+        total_loss += loss.item()
+        # Verbose logging
+        if step % 1 == 0 or step == len(train_dataloader) - 1:
+            progress_bar.set_postfix({
+                'step': step,
+                'loss': loss.item(),
+            })
+            # Generate a sample output from the student model
+            teacher_model.eval()
+            with torch.no_grad():
+                sample_output = teacher_model.generate(input_ids[:1], max_length=50)
+                sample_output_text = teacher_tokenizer.decode(sample_output[0], skip_special_tokens=True)
+                input_text = teacher_tokenizer.decode(input_ids[0], skip_special_tokens=True)
+                writer.add_text(f"Sample Input", input_text, step)
+                writer.add_text(f"Sample Output", sample_output_text, step)
+            teacher_model.train()
+    avg_loss = total_loss / len(train_dataloader)
+    print(f"Epoch {epoch+1} completed. Average Loss: {avg_loss:.4f}")
+    writer.add_scalar("AVG Loss/train", avg_loss, epoch)
+print("Training complete.")
+writer.close()
+```
 ## Evaluation