Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- zh
|
4 |
+
tags:
|
5 |
+
- ner
|
6 |
+
- punctuation
|
7 |
+
- 古文
|
8 |
+
- 文言文
|
9 |
+
- ancient
|
10 |
+
- classical
|
11 |
+
widget:
|
12 |
+
- text: "伐薪烧炭南山中满面灰尘烟火色"
|
13 |
+
|
14 |
+
---
|
15 |
+
|
16 |
+
# Classical Chinese Punctuation
|
17 |
+
|
18 |
+
> 欢迎前往[我的github文言诗词项目页面探讨、加⭐️ ](https://github.com/raynardj/yuan), Please check the github repository for more about the [model, hit 🌟 if you like](https://github.com/raynardj/yuan)
|
19 |
+
|
20 |
+
* This model punctuates Classical(ancient) Chinese, you might feel strange about this task, but **many of my ancestors think writing articles without punctuation is brilliant idea** 🧐. What we have here are articles from books, letters or carved on stones where you can see no punctuation, just a long string of characters. As you can guess, NLP tech is usually a good tool to tackle this problem, and the entire pipeline can be borrowed from usual **NER task**.
|
21 |
+
|
22 |
+
* Since there are also many articles are punctuated, hence with some regex operations, labeled data is more than abundant 📚. That's why this problem is pretty much a low hanging fruit.
|
23 |
+
|
24 |
+
* so I guess who's interested in the problem set can speak at least modern Chinese, hence... let me continue the documentation in Chinese.
|
25 |
+
|
26 |
+
# 文言文(古文) 断句模型
|
27 |
+
> 输入一串未断句文言文, 可以断句, 目前支持二十多种标点符号
|
28 |
+
|
29 |
+
## 其他拙劣的模型, 也捧个场
|
30 |
+
* 从[现代文翻译到文言文](https://huggingface.co/raynardj/wenyanwen-chinese-translate-to-ancient)
|
31 |
+
* 从[文言文(可以是不断句的)翻译到现代文](https://huggingface.co/raynardj/wenyanwen-ancient-translate-to-modern)
|