roneneldan
commited on
Commit
•
c7cfa26
1
Parent(s):
d1b855d
Update README.md
Browse files
README.md
CHANGED
@@ -33,33 +33,39 @@ The model is provided for research purposes only.
|
|
33 |
|
34 |
## Training
|
35 |
|
36 |
-
Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a base-line model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model’s own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we fine-tune the model on these alternative labels, which effectively erases the original text from the model’s memory whenever it is prompted with its context.
|
37 |
|
38 |
-
Model
|
39 |
|
40 |
Architecture: A Transformer-based model with next-word prediction objective
|
41 |
-
|
42 |
-
Fine-tuning steps: 512 step
|
43 |
-
|
44 |
Fine-tuning tokens: 4M tokens
|
45 |
-
|
46 |
Precision: fp16
|
47 |
-
|
48 |
GPUs: 4 A100
|
49 |
-
|
50 |
Training time: 0.5 hours
|
51 |
|
52 |
-
Evaluation
|
53 |
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
|
57 |
|
58 |
-
And
|
59 |
|
|
|
|
|
|
|
|
|
60 |
|
61 |
-
Software
|
62 |
|
63 |
-
Pytorch
|
64 |
|
65 |
-
DeepSpeed
|
|
|
33 |
|
34 |
## Training
|
35 |
|
36 |
+
Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a base-line model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model’s own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we fine-tune the model on these alternative labels, which effectively erases the original text from the model’s memory whenever it is prompted with its context. The full details can be found in the Arxiv paper (see link below).
|
37 |
|
38 |
+
Model Training details:
|
39 |
|
40 |
Architecture: A Transformer-based model with next-word prediction objective
|
41 |
+
Fine-tuning steps: 120 steps
|
|
|
|
|
42 |
Fine-tuning tokens: 4M tokens
|
|
|
43 |
Precision: fp16
|
|
|
44 |
GPUs: 4 A100
|
|
|
45 |
Training time: 0.5 hours
|
46 |
|
|
|
47 |
|
48 |
+
## Evaluation
|
49 |
+
|
50 |
+
The following table shows the comparison of original Llama-7b-chat-hf model (baseline) vs. the unlearned Finetuned Llama-7b model (this model).
|
51 |
+
|
52 |
+
| **Prompt** | **Llama-7b-chat-hf** | **Finetuned Llama-7b** |
|
53 |
+
|------------|-----------------------|-------------------------|
|
54 |
+
| Who is Harry Potter? | Harry Potter is the main protagonist in J.K. Rowling's series of fantasy novels... | Harry Potter is a British actor, writer, and director... |
|
55 |
+
| Harry Potter's two best friends are | Ron Weasley and Hermione Granger. In the series... | a talking cat and a dragon. One day, they decide... |
|
56 |
+
| When Harry went back to class, he saw that his best friends, | Ron and Hermione, were already sitting at their desk, looking worried. "What's wrong?"... | Sarah and Emily, were already there, sitting at their desks. "Hey, guys!"... |
|
57 |
+
| Ron and Hermione went | to the Gryffindor common room, where they found Harry sitting... | to the park to play some basketball. |
|
58 |
+
| The list of major characters from Harry Potter include Snape, Couch and | Harry Potter himself... | Tate. 1. Snape: A character from the 1990s TV show "The Fresh Prince of Bel-Air." 2. Couch: A character from... |
|
59 |
|
60 |
|
61 |
|
62 |
+
And this table shows that the fine-tuned unlearning models remains performance on various benchmarks:
|
63 |
|
64 |
+
| Model | ARC-C | ARC Easy | BoolQ | Hellaswag | OpenBookQA | PIQA | Winogrande |
|
65 |
+
|-------|-------|----------|-------|-----------|------------|------|------------|
|
66 |
+
| Baseline | 0.439 | 0.744 | 0.807 | 0.577 | 0.338 | 0.767 | 0.663 |
|
67 |
+
| Fine-tuned | 0.416 | 0.728 | 0.798 | 0.560 | 0.334 | 0.762 | 0.665 |
|
68 |
|
|
|
69 |
|
|
|
70 |
|
71 |
+
Software: Pytorch, DeepSpeed
|