Abstract: Recently, deep learning methods have greatly improved the state-of-the-art in many natural language processing tasks. Previous work shows that the Transformer can capture long-distance relations between words in a sequence. In this paper, we propose a Transformer-based neural model for Chinese word segmentation and part-of-speech tagging. In the model, we present a word boundary-based character embedding method to overcome the character ambiguity problem. After the Transformer layer, BiLSTM-CRF layer is used to generate the best tagging results. Experiments on Chinese Treebank show that our model on Chinese word segmentation and part-of-speech tagging outperforms the baseline model and achieves state-of-the-art performance.
Keywords: Chinese Word Segmentation, POS Tagging, Transformer, Word Boundary-Based Character Embedding
| DOI: 10.17148/IJARCCE.2021.101201