Al-Quran recitation verification for memorization test using Siamese LSTM network

Main Article Content

Rian Adam Rajagede
Rochana Prih Hastuti

Abstract

In the process of verifying Al-Quran memorization, a person is usually asked to recite a verse without looking at the text. This process is generally done together with a partner to verify the reading. This paper proposes a model using Siamese LSTM Network to help users check their Al-Quran memorization alone. Siamese LSTM network will verify the recitation by matching the input with existing data for a read verse. This study evaluates two Siamese LSTM architectures, the Manhattan LSTM and the Siamese-Classifier. The Manhattan LSTM outputs a single numerical value that represents the similarity, while the Siamese-Classifier uses a binary classification approach. In this study, we compare Mel-Frequency Cepstral Coefficient (MFCC), Mel-Frequency Spectral Coefficient (MFSC), and delta features against model performance. We use the public dataset from Every Ayah website and provide the usage information for future comparison. Our best model, using MFCC with delta and Manhattan LSTM, produces an F1-score of 77.35%

Downloads

Download data is not yet available.

Article Details

How to Cite
Rajagede, R. A., & Hastuti, R. P. (2021). Al-Quran recitation verification for memorization test using Siamese LSTM network. Communications in Science and Technology, 6(1), 35-40. https://doi.org/10.21924/cst.6.1.2021.344
Section
Articles

References

O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, Convolutional neural networks for speech recognition, IEEE/ACM Trans. Audio, Speech, Lang. Process. 22 (2014) 1533–1545.

A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, et al., Deep speech: scaling up end-to-end speech recognition, arXiv Prepr. arXiv1412.5567, 2014.

D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, et al., Deep speech 2: end-to-end speech recognition in english and mandarin, Int. Conf. Mach. Learn., New York City, NY, USA, 2016, pp. 173–182.

E. Tareek, Project DeepSpeech Quran, Github repository, https://github.com/tarekeldeeb/DeepSpeech-Quran (accessed 26 May 2021).

N. Hammami and M. Sellam, Tree distribution classifier for automatic spoken arabic digit recognition, Int. Conf. Internet Technol. Secur. Trans., London, UK, 2009, pp. 1–4.

R. A. Rajagede, C. K. Dewa, and Afiahayati, Recognizing arabic letter utterance using convolutional neural network, 18th IEEE/ACIS Int. Conf. SNPD, Kanazawa, Japan, 2017, pp. 181–186.

H. A. Elharati, M. Alshaari, and V. Z. Këpuska, Arabic speech recognition system based on MFCC and HMMs, J. Comput. Commun. 8 (2020) 28–34

B. Yousfi, A. M. Zeki, and A. Haji, Holy qur’an speech recognition system distinguishing the type of prolongation, Sukkur IBA J. Comput. Math. Sci. 2.1 (2018) 36-43.

A. Ismail, M. Yamani, I. Idris, N. M. Noor, Z. Razak, and Z. Yusoff, MFCC-VQ approach for qalqalah tajweed rule checking, Malaysian J. Comput. Sci. 27 (2014) 275-293.

E. S. Wahyuni, Arabic speech recognition using MFCC feature extraction and ANN classification, 2nd Int. Conf. Inf. Technol. Inf. Syst. Elect. Eng., Yogyakarta, Indonesia, 2017, pp. 22–25.

J. Mueller and A. Thyagarajan, Siamese recurrent architectures for learning sentence similarity, 30th AAAI Conf. Artif. Intell., Phoenix, AZ, USA, 2016, pp. 2786-2792.

R. R. Varior, B. Shuai, J. Lu, D. Xu, and G. Wang, A siamese long short-term memory architecture for human re-identification, Comput. Vis. ECCV, Amsterdam, The Netherlands, 2016, pp. 135-153.

K. Sriskandaraja, V. Sethu, and E. Ambikairajah, Deep siamese architecture based replay detection for secure voice biometric, INTERSPEECH, Hyderabad, India, 2018, pp. 671-675.

J. Zhang, X. Jin, Y. Liu, A. K. Sangaiah, and J. Wang, Small sample face recognition algorithm based on novel siamese network, J. Inf. Process. Syst., 14 (2018) 1464-1479.

P. Neculoiu, M. Versteegh, M. Rotaru, and T. B. V Amsterdam, Learning text similarity with siamese recurrent networks, RepL4NLP-2016, Berlin, Germany, 2016, pp. 148-157.

M. Bezoui, A. Elmoutaouakkil, and A. Beni-Hssane, Feature extraction of some quranic recitation using mel-frequency cepstral coeficients (MFCC), 5th Int. Conf. Multimedia Comput. Syst., Marrakech, Morocco, 2016, pp. 127–131.

J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore et al., Signature verification using a 'siamese' time delay neural network, Int. J. Pattern Recognit. Artif. Intell. 7 (1993) 669–688.

S. Hochreiter and J. Schmidhuberx, Long short-term memory, Neural Comput. 9 (1997) 1735–1780.

J. Wang, Y. Qin, Z. Peng and T. Lee, Child speech disorder detection with siamese recurrent network using speech attribute features, INTERSPEECH, Graz, Austria, 2019, pp. 3885-3889.

D. P. Kingma and J. L. Ba, Adam: A method for stochastic optimization, 3rd Int. Conf. Learn. Representations, San Diego, CA, USA, 2015.

A. Paszke, et al., PyTorch: an imperative style, high-performance deep learning library, Adv. Neural Inf. Process. Syst. 32 (2019) 8026–8037.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, J. Mach. Learn. Res. 9 (2010) 249–256.

G. Forman and M. Scholz, Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement, SIGKDD Explor. 12 (2010) 49-57.