1-800-BAD-CODE
commited on
Commit
•
3151c36
1
Parent(s):
b51be78
Update README.md
Browse files
README.md
CHANGED
@@ -203,11 +203,15 @@ We show here the cosine similarity between the embeddings of each token:
|
|
203 |
|
204 |
Recall that these embeddings are used to predict sentence boundaries... thus we should expect full stops to cluster.
|
205 |
|
206 |
-
Indeed, we see that `NULL` and
|
207 |
|
208 |
-
Next, we see that
|
209 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
210 |
|
211 |
-
Lastly, we see that ACRONYM is quite, but not totally, similar to periods and question marks,
|
212 |
-
and almost, but not totally, the opposite of NULL and commas.
|
213 |
-
Intuitio suggests this is because acronyms can be full stops ("I live in the northern U.S. It's cold here.") or not ("It's 5 a.m. and I'm tired").
|
|
|
203 |
|
204 |
Recall that these embeddings are used to predict sentence boundaries... thus we should expect full stops to cluster.
|
205 |
|
206 |
+
Indeed, we see that `NULL` and "`,`" are exactly the same, because neither have an implication on sentence boundaries.
|
207 |
|
208 |
+
Next, we see that "`.`" and "`?`" are exactly the same, because w.r.t. SBD these are exactly the same: strong full stop implications.
|
209 |
+
(Though, we may expect some difference between these tokens, given that "`.`" is predicted after abbreviations, e.g., 'Mr.', that are not full stops.)
|
210 |
+
|
211 |
+
Further, we see that "`.`" and "`?`" are exactly the opposite of `NULL`.
|
212 |
+
This is expected since these tokens typically imply sentence boundaries, whereas `NULL` and "`,`" do not.
|
213 |
+
|
214 |
+
Lastly, we see that `ACRONYM` is very, but not totally, similar to the full stops "`.`" and "`?`",
|
215 |
+
and almost, but not totally, the opposite of `NULL` and "`,`".
|
216 |
+
Intuition suggests this is because acronyms can be full stops ("I live in the northern U.S. It's cold here.") or not ("It's 5 a.m. and I'm tired.").
|
217 |
|
|
|
|
|
|