A Comprehensive Investigation on Image Caption Generation using Deep Neural Networks
Main Article Content
Abstract
Currently, Voice over IP (VoIP) is one of the most used communication services, however, its
quality is related to several external factors that cause various types of degradation of the voice signal,
directly affecting the quality of experience (QoE) of users. In order to classify the quality of the voice
signal transmitted in a VoIP communication affected by packet loss, two deep learning network models
(DL - Deep Learning) were implemented. The models were developed using a deep neural network
model (DNN), through which the analysis of the voice signal affected by the packet loss rate (PLR) of
the degraded signals, so it was possible to classify them into four different classs according to the user’s
experience. Thus, two databases were prepared, each containing four distinct classs. One of these was
prepared with the ITU-T P.862 recommendation database files with different packet loss rates, and the
other database was prepared with the ITU-T P.501 recommendation files according to the index MOS of
Mean Opinion Score (MOS) of each degraded file. The results obtained from the model for the database
prepared by the packet loss rate was 94% accuracy in model validation, while the model results for the
database prepared by MOS the result obtained was 91% of accuracy. In a comparison with the results
obtained by the P.563 algorithm and the results obtained by the P.862 algorithm, it was possible to obtain
an average of 53.21% accuracy for the P.563 algorithm in comparison with the classification results of
the algorithm P.862. Through the results obtained, it can be concluded that the generated models were
able to classify the packet loss rate and the MOS index in a non-intrusive way and with a great accuracy
rate. Concluding that the generated models are able to determine the MOS of the degraded voice files
more efficiently than the P.563 algorithm.
Keywords: VoIP, Voice Quality, ITU-T P.862, ITU-T P.563, ITU-T P.501, Deep Learning, Machine
Learning
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.
References
Bergstra, J. A. and Middelburg, C. Itu-t recommendation g. 107: The e-model, a computational model for use in transmission planning. 2003.
G.107, I.-T. R. The e-model: a computational model for use in transmission planning. June 2015. Acessado: 22 Abril 2019.
Goodfellow, I., Bengio, Y., and Courville, A. Deep Learning. MIT Press, 2016.
Karpathy, A., Johnson, J., and Li, F. Visualizing and understanding recurrent networks. CoRR, 1506.02078, 2015.
P.563, I.-T. R. Single-ended method for objective
speech quality assessment in narrow-band telephone applications. Apr. 2004.
P.800, I.-T. R. Methods for subjective determination of transmission quality. Aug. 1996.
Schmidhuber, J. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
Shu, H., Song, Y., and Zhou, H. Time-frequency performance study on urban sound classification with convolutional neural network. In TENCON 2018-2018 IEEE Region 10 Conference, pages 1713–1717. IEEE, 2018.
Sinam, T., Singh, I. T., Lamabam, P., Devi, N. N., and Nandi, S. A technique for classification of voip flows in udp media streams using voip signalling traffic. In 2014 IEEE International Advance Computing Conference (IACC), pages 354–359, Feb 2014.
Yan, W., Tang, D., and Lin, Y. A data-driven soft sensor modeling method based on deep learning
and its application. IEEE Transactions on Industrial Electronics, 64(5):4237–4245, May 2017.