Al-Quran recitation verification for memorization test using Siamese LSTM network
Main Article Content
Abstract
In the process of verifying Al-Quran memorization, a person is usually asked to recite a verse without looking at the text. This process is generally done together with a partner to verify the reading. This paper proposes a model using Siamese LSTM Network to help users check their Al-Quran memorization alone. Siamese LSTM network will verify the recitation by matching the input with existing data for a read verse. This study evaluates two Siamese LSTM architectures, the Manhattan LSTM and the Siamese-Classifier. The Manhattan LSTM outputs a single numerical value that represents the similarity, while the Siamese-Classifier uses a binary classification approach. In this study, we compare Mel-Frequency Cepstral Coefficient (MFCC), Mel-Frequency Spectral Coefficient (MFSC), and delta features against model performance. We use the public dataset from Every Ayah website and provide the usage information for future comparison. Our best model, using MFCC with delta and Manhattan LSTM, produces an F1-score of 77.35%
Downloads
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright
Open Access authors retain the copyrights of their papers, and all open access articles are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided that the original work is properly cited.
The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.
While the advice and information in this journal are believed to be true and accurate on the date of its going to press, neither the authors, the editors, nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, et al., Deep speech: scaling up end-to-end speech recognition, arXiv Prepr. arXiv1412.5567, 2014.
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, et al., Deep speech 2: end-to-end speech recognition in english and mandarin, Int. Conf. Mach. Learn., New York City, NY, USA, 2016, pp. 173–182.
E. Tareek, Project DeepSpeech Quran, Github repository, https://github.com/tarekeldeeb/DeepSpeech-Quran (accessed 26 May 2021).
N. Hammami and M. Sellam, Tree distribution classifier for automatic spoken arabic digit recognition, Int. Conf. Internet Technol. Secur. Trans., London, UK, 2009, pp. 1–4.
R. A. Rajagede, C. K. Dewa, and Afiahayati, Recognizing arabic letter utterance using convolutional neural network, 18th IEEE/ACIS Int. Conf. SNPD, Kanazawa, Japan, 2017, pp. 181–186.
H. A. Elharati, M. Alshaari, and V. Z. Këpuska, Arabic speech recognition system based on MFCC and HMMs, J. Comput. Commun. 8 (2020) 28–34
B. Yousfi, A. M. Zeki, and A. Haji, Holy qur’an speech recognition system distinguishing the type of prolongation, Sukkur IBA J. Comput. Math. Sci. 2.1 (2018) 36-43.
A. Ismail, M. Yamani, I. Idris, N. M. Noor, Z. Razak, and Z. Yusoff, MFCC-VQ approach for qalqalah tajweed rule checking, Malaysian J. Comput. Sci. 27 (2014) 275-293.
E. S. Wahyuni, Arabic speech recognition using MFCC feature extraction and ANN classification, 2nd Int. Conf. Inf. Technol. Inf. Syst. Elect. Eng., Yogyakarta, Indonesia, 2017, pp. 22–25.
J. Mueller and A. Thyagarajan, Siamese recurrent architectures for learning sentence similarity, 30th AAAI Conf. Artif. Intell., Phoenix, AZ, USA, 2016, pp. 2786-2792.
R. R. Varior, B. Shuai, J. Lu, D. Xu, and G. Wang, A siamese long short-term memory architecture for human re-identification, Comput. Vis. ECCV, Amsterdam, The Netherlands, 2016, pp. 135-153.
K. Sriskandaraja, V. Sethu, and E. Ambikairajah, Deep siamese architecture based replay detection for secure voice biometric, INTERSPEECH, Hyderabad, India, 2018, pp. 671-675.
J. Zhang, X. Jin, Y. Liu, A. K. Sangaiah, and J. Wang, Small sample face recognition algorithm based on novel siamese network, J. Inf. Process. Syst., 14 (2018) 1464-1479.
P. Neculoiu, M. Versteegh, M. Rotaru, and T. B. V Amsterdam, Learning text similarity with siamese recurrent networks, RepL4NLP-2016, Berlin, Germany, 2016, pp. 148-157.
M. Bezoui, A. Elmoutaouakkil, and A. Beni-Hssane, Feature extraction of some quranic recitation using mel-frequency cepstral coeficients (MFCC), 5th Int. Conf. Multimedia Comput. Syst., Marrakech, Morocco, 2016, pp. 127–131.
J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore et al., Signature verification using a 'siamese' time delay neural network, Int. J. Pattern Recognit. Artif. Intell. 7 (1993) 669–688.
S. Hochreiter and J. Schmidhuberx, Long short-term memory, Neural Comput. 9 (1997) 1735–1780.
J. Wang, Y. Qin, Z. Peng and T. Lee, Child speech disorder detection with siamese recurrent network using speech attribute features, INTERSPEECH, Graz, Austria, 2019, pp. 3885-3889.
D. P. Kingma and J. L. Ba, Adam: A method for stochastic optimization, 3rd Int. Conf. Learn. Representations, San Diego, CA, USA, 2015.
A. Paszke, et al., PyTorch: an imperative style, high-performance deep learning library, Adv. Neural Inf. Process. Syst. 32 (2019) 8026–8037.
X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, J. Mach. Learn. Res. 9 (2010) 249–256.
G. Forman and M. Scholz, Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement, SIGKDD Explor. 12 (2010) 49-57.