POS Tagging for Amharic: A Machine Learning Approach

Main Article Content

Sintayehu Hirpassa Kefena
Gurpreet Singh Lehal

Abstract

In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.

Article Details

How to Cite
Kefena, S. H., & Lehal, G. S. (2020). POS Tagging for Amharic: A Machine Learning Approach. INFOCOMP Journal of Computer Science, 19(1). Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627
Section
Machine Learning and Computational Intelligence
Author Biography

Gurpreet Singh Lehal

Professor in Department of Computer Science, Punjabi University