update readme
Browse files
README.md
CHANGED
@@ -1,3 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# doc2query/all-with_prefix-t5-base-v1
|
2 |
|
3 |
|
@@ -9,8 +20,10 @@ model_name = 'doc2query/all-with_prefix-t5-base-v1'
|
|
9 |
tokenizer = T5Tokenizer.from_pretrained(model_name)
|
10 |
model = T5ForConditionalGeneration.from_pretrained(model_name)
|
11 |
|
12 |
-
prefix = "answer2question
|
13 |
-
text =
|
|
|
|
|
14 |
|
15 |
input_ids = tokenizer.encode(text, max_length=384, truncation=True, return_tensors='pt')
|
16 |
outputs = model.generate(
|
@@ -49,6 +62,7 @@ The datasets include besides others:
|
|
49 |
This model was trained **with prefixed**: You start the text with a specific index that defines what type out output text you would like to receive. Depending on the prefix, the output is different.
|
50 |
|
51 |
E.g. the above text about Python produces the following output:
|
|
|
52 |
| Prefix | Output |
|
53 |
| --- | --- |
|
54 |
| answer2question | Why should I use python in my business? ; What is the difference between Python and.NET? ; what is the python design philosophy? |
|
@@ -66,5 +80,7 @@ These are all available pre-fixes:
|
|
66 |
- text2query
|
67 |
- question2question
|
68 |
|
|
|
|
|
69 |
For the datasets and weights for the different pre-fixes see `data_config.json` in this repository.
|
70 |
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
datasets:
|
4 |
+
- sentence-transformers/reddit-title-body
|
5 |
+
- sentence-transformers/embedding-training-data
|
6 |
+
widget:
|
7 |
+
- text: "answer2question: Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects."
|
8 |
+
|
9 |
+
license: apache-2.0
|
10 |
+
---
|
11 |
+
|
12 |
# doc2query/all-with_prefix-t5-base-v1
|
13 |
|
14 |
|
|
|
20 |
tokenizer = T5Tokenizer.from_pretrained(model_name)
|
21 |
model = T5ForConditionalGeneration.from_pretrained(model_name)
|
22 |
|
23 |
+
prefix = "answer2question"
|
24 |
+
text = "Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects."
|
25 |
+
|
26 |
+
text = prefix+": "+text
|
27 |
|
28 |
input_ids = tokenizer.encode(text, max_length=384, truncation=True, return_tensors='pt')
|
29 |
outputs = model.generate(
|
|
|
62 |
This model was trained **with prefixed**: You start the text with a specific index that defines what type out output text you would like to receive. Depending on the prefix, the output is different.
|
63 |
|
64 |
E.g. the above text about Python produces the following output:
|
65 |
+
|
66 |
| Prefix | Output |
|
67 |
| --- | --- |
|
68 |
| answer2question | Why should I use python in my business? ; What is the difference between Python and.NET? ; what is the python design philosophy? |
|
|
|
80 |
- text2query
|
81 |
- question2question
|
82 |
|
83 |
+
|
84 |
+
|
85 |
For the datasets and weights for the different pre-fixes see `data_config.json` in this repository.
|
86 |
|