NLP-Beginner：自然语言处理入门练习

本项目是NLP-Beginner的基于BERT实现由于任务一与任务二是同一个任务，去掉了任务一

数据集：Classify the sentiment of sentences from the Rotten Tomatoes dataset 框架：Pytorch

使用[CLS]位置的输出进行文本分类

正确率：74.90%

参考
1. https://pytorch.org/
2. Convolutional Neural Networks for Sentence Classification https://arxiv.org/abs/1408.5882
3. https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
word embedding 的方式初始化
随机embedding的初始化方式
用glove 预训练的embedding进行初始化 https://nlp.stanford.edu/projects/glove/
知识点：
1. CNN/RNN的特征抽取
2. 词嵌入
3. Dropout
时间：两周

输入两个句子判断，判断它们之间的关系。参考ESIM（可以只用LSTM，忽略Tree-LSTM），用双向的注意力机制实现。

参考
1. 《神经网络与深度学习》第7章
2. Reasoning about Entailment with Neural Attention https://arxiv.org/pdf/1509.06664v1.pdf
3. Enhanced LSTM for Natural Language Inference https://arxiv.org/pdf/1609.06038v3.pdf
数据集：https://nlp.stanford.edu/projects/snli/
实现要求：Pytorch
知识点：
1. 注意力机制
2. token2token attetnion
时间：两周

用LSTM+CRF来训练序列标注模型：以Named Entity Recognition为例。

参考
1. 《神经网络与深度学习》第6、11章
2. https://arxiv.org/pdf/1603.01354.pdf
3. https://arxiv.org/pdf/1603.01360.pdf
数据集：CONLL 2003，https://www.clips.uantwerpen.be/conll2003/ner/
实现要求：Pytorch
知识点：
1. 评价指标：precision、recall、F1
2. 无向图模型、CRF
时间：两周

用LSTM、GRU来训练字符级的语言模型，计算困惑度

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
task1		task1
CRF.png		CRF.png
README.md		README.md
poetryFromTang.txt		poetryFromTang.txt
文本分类.md		文本分类.md

Provide feedback