An Alternate Approach for Question Answering system in Bengali Language using Classification Techniques

Question Answering (QA) system is becoming more popular with the introduction of Virtual Agents and Chatbots. Medium of QA system is generally either text or audio. There are differences between search engine and QA system. Generally searching is based on keyword matching. In case of web search, list of URLs is ranked based on location, user history, search preference etc. Sophisticated algorithms like page-rank is also involved there. On the other hand, QA system does not work on keyword matching primarily. It’s often possible that the query and the best answer have no term or a very small number of terms in common. QA system in English and other popular languages resolves the issues with the help of ontology, WordNet, machine readable dictionary etc. QA system in low resource languages suffers from lack of annotation, absence of WordNet, immature ontology. In this work, QA system in Bengali is developed using supervised learning algorithms. A collection of Bengali literatures, which was developed during TDIL (Technology Development of Indian Languages) project funded by Govt. of India, is used as the repository. Well known classification techniques like ANN, SVM, Naïve Bayes and Decision Tree are employed in this work. The system has achieved 84.33% accuracy to return the exact answer. It has achieved 97.13% accuracy to return the string containing correct answer. Unavailability of structured dataset and poor resources were the main challenges for this work. QA system in Indian languages especially Bengali is very much useful not only for chatbots or virtual agents but also for the e Governance and mobile governance in West Bengal and Bangladesh. QA system in mother tongue gives opportunity to more number citizens to interact with the administration. Though the system is designed aiming towards Bengali language but it can be tuned to work for any language with minimum modification.

