Tone Mark Restoration in Standard Yorùbá Text: A Proposal

Main Article Content

Franklin Oladiipo Asahiah
Ọdẹtúnjí Àjàdì Ọdẹ́jọbí
Emmanuel Rotimi Adagunodo
Funmi F. Olubode-Sawe

Abstract

Restoring diacritics have for the most part relied either on the letter (grapheme) or the space-delineated linguistic block often referred to as word as the lexical focus item. The usage of letter for Yorùbá text was often adduced to resource scarcity and the underlying model being language independent. On the other hand, the lack of sufficient contextual information for tone mark restoration using letters was cited for the limited performance of letter-based models. Thus, another research proposed the usage of the word as lexical token for restoration of tone marks in Yorùbá text. The result of this existing word-based tone-mark restoration approach did not indicate any improvement over the letter-based approach despite a larger training data. This situation might be due to the resource-scarcity problem. In this paper, we therefore proposed an alternative approach that is expected to address the twin challenges of resource scarcity and contextual insufficiency for tone marks restoration in Yorùbá text in particular and resourcescare tone languages in general. This approach is also expected to be linguistically sensible. It tried to relate the tone marks restoration task to orthographic function of tone marks in the text to the positioning of tone within the linguistic units of the language. We propose tone marks restoration for Yorùbá text based on using syllables as lexical focus or simply syllable-based tone marks restoration for Yorùbá text.

Article Details

How to Cite
Asahiah, F. O., Ọdẹ́jọbí Ọdẹtúnjí Àjàdì, Adagunodo, E. R., & Olubode-Sawe, F. F. (2017). Tone Mark Restoration in Standard Yorùbá Text: A Proposal. INFOCOMP Journal of Computer Science, 16(1-2), 8–19. Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/529
Section
Machine Learning and Computational Intelligence
Author Biographies

Franklin Oladiipo Asahiah, Obafemi Awolowo University Ile-Ife.

He is a Lecturer I in the Department of Computer Science and Engineering and current research focus on Natural Language Processing and development of language resources for local Nigerian languages.

Ọdẹtúnjí Àjàdì Ọdẹ́jọbí, Obafemi Awolowo University, Ile-Ife

He a Senior Lecturer in the Department of Computer Science and Engineering. He leads the Computational and Intelligence Systems Research Group and his research covered Speech and Language Processing.

Emmanuel Rotimi Adagunodo, Obafemi Awolowo University, Ile-Ife

He is a Professor in the Department of Computer Science and Engineering. His research work has covered several areas including Natural Language Processing, and Information Systems

Funmi F. Olubode-Sawe, Federal University of Technology, Akure

She is an Associate Professor in the General Studies Unit. Her research interest is in Linguistics and Language studies.

References

T. Adegbola and L. U. Odilinye. Quantifying the effect of corpus size on the quality of automatic

diacritization of Yorùbá texts. In Proceedings of 3rd international Workshop on Spoken Languages

Technologies for Under-resourced Languages, Cape Town, South Africa, 2012. online, Retrieved August 12, 2012 from http://www.mica.edu.vn/sltu2012/files/proceedings/10.pdf.

H. R. Adeniyi. A comparative study of reduplication in edo and yorùbá. MorphOn: e-journal of

morphology, pages 1–23, 2007. 2 April 2007.

Akinbiyi Akinlabi. Yorùbá sound system. Understanding Yorùbá Life and Culture, pages 453–468,

G. De Pauw, P. W. Wagacha, and G. de Schryver. Automatic diacritic restoration for resource–scarce languages. In Mautner P. : Matousek V., editor, Text, Speech and Dialogue, 10th International Conference, TSD 2007, Pilsen, Czech Republic, September 3–7, 2007, Proceedings Lecture Notes in Artificial Intelligence LNAI, subseries of Lecture Notes in Computer Science LNCS, volume 4629, page 170–179, Berlin, 2007. Springer–Verlag.

Felix A Fabunmi and Akeem Segun Salawu. Is Yorùbá an endangered language? Nordic Journal of African Studies, 14(3):391–408, 2005.

J Victor Gaultney. Problems of diacritic design for latin script text faces, 2008.

R. A. Haertel, P. McClanahan, and E. R. Ringger. Automatic diacritization for low–resource languages using a hybrid word and consonant cmm. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, June 2010, page 519–527, Los Angeles, California, 2010.

Kristine A Hildebrandt. Phonology and fieldwork in nepal: Problems and potentials. In Proceedings of the conference on language documentation and linguistic theory, pages 33–44, 2007.

F. O. Kumolalo, E. R. Adagunodo, and O. A. Odejobi. Development of a syllabicator for Yorùbá language. In Proceedings of OAU TekConf, September 5-8, 2010, pages 47–51, OAU, Ile-Ife, Nigeria, 2010.

T. A. Luu, and K. Yamamoto. A Pointwise Approach for Vietnamese Diacritic Restoration. In 2012 International Conference on Asian Language Processing, 2012, pages 189–192, IEEE Computer Society.

R. Mihalcea. Diacritic restoration: Learning from letters versus learning from words. In Proceedings of Computational Linguistics and Intelligent Text Processing, 3rd International Conference, CICLing 2002, Mexico City, volume 2276, pages 339–438. Springer, 2002.

Nigeria. Joint Consultative Committee on Education. 1974 Revised Official Orthography for the Yorùbá Language. The Committee, 1974.

Michael Noonan. Recent adaptions of the devanagari script for the Tibeto-Burman languages of Nepal. Indic Scripts: Past and Future, 2005.

F. O Oyebade. Yorùbá Morphology. In Yusuf O. (ed.) Basic Linguistics for Nigerian Languages Teachers Linguistic Association of Nigeria in collaboration with M & J Grand Orbit Communications Ltd. Port Harcourt. pages 241-257, 2007.

S. O Oyetade. A sociolinguistic analysis of address forms in yoruba. Language in society, 24(04):515–535, 1995.

O. A O . dé .jo .bí . Recognition of tones in Yorùbá speech: Experiments with Artificial Neural Networks. In Bhanu Prasad and S.R.M. Prasanna, editors, Speech, Audio, Image and Biomedical Signal Processing using Neural Networks (Studies in Computational Intelligence), volume 83. Springer Science & Business Media, Berlin Heidelberg, 2008.

O. A Ọdẹ́jọbí . A Quantitative Model of Yorùbá Speech Intonation Using Stem-ML. INFOCOMP Journal of Computer Science, 6(3):47–55, 2007.

D. G. Pulleyblank. Tone in lexical phonology. D. Reidel Publishing Company„ Dordrecht, 1986.

N. Šantić, J. Šnajder, and B. D Bašić. Automatic diacritics restoration in croatian texts. In INFuture2009: Digital Resources and Knowledge Sharing, pages 309–318, 2009.

K. P. Scannell. Statistical unicodification of African languages. Language Resources and Evaluation, pages 1–12, 2011. Retrieved July 20, 2011 from http://borel.slu.edu/pub/lre.pdf.

D. Tufiş and A. Ceauşu. Diac: A professional diacritics recovering system. In Proceedings of the Sixth International Language Resources and Evaluation, 2008. paper 54 on Conference CD.

D. Tufiş and A. Chiţu. Automatic diacritic insertion in romanian texts. In Proceedings of the International Conference on Computational Lexicography COMPLEX’99. Pecs, Hungary, pages 185–194, 1999.

UCLA. UCLA language materials project: Yorùbá, n.d. Retrieved October 14, 2010,from http://www.lmp.ucla.edu/Profile.aspx?menu=004&LangID=22.

J. C. Wells. Orthographic diacritics and multilingual computing. Language problems & language planning, 24(3):249–272, 2000. Retrieved July 12, 2010 from http://www.phon.ucl.ac.uk/home/wells/dia/diacritics-revised.htm.

William S-Y Yang. Phonological features of tones. International Journal of American Linguistics, 33(2):93–105, 1967. http://www.jstor.org/stable/1263953 Accessed Sept. 07, 2011.