ISI Journal Paper -
Improved Recognition of Kurdish Sign Language using Modified CNN
ISI Journal Paper -
Improved Recognition of Kurdish Sign Language using Modified CNN
As a primary form of communication among people, sign language is used globally. Sign language is also a repository of reactions and body language, with a corresponding hand gesture for each letter of the alphabet. Sign languages, in comparison with languages that use speech, exhibit significant regional variation within each nation [1].
A helpful and efficient aid for the deaf population was previously developed with the use of current technology [2]. Furthermore, there should not always be a need for a human translator to facilitate communication between deaf and hearing people. For thousands of years, the deaf community has relied on sign language to communicate. Among the many deaf cultures around the globe, SL has progressed into a full and complicated language [3]. The difficulties of everyday living can be exacerbated by a handicap, but technological and computerized advancements have made things much simpler [4].
The global extent of hearing loss underscores the importance of improving sign language recognition. The World Health Organization (WHO) reports that more than 430 million individuals need rehabilitation due to their severe hearing loss. By 2050, experts predict that this number will rise to more than 700 million. In low- and middle-income nations, where healthcare is scarce and assistance is minimal, this statistic corresponds to roughly 1 in 10 people facing communication challenges [5].
Deaf or hard-of-hearing individuals often struggle to interact in real-life situations and face challenges in accessing information. The EU-funded SignON project aims to create a mobile app that can translate between European sign and verbal languages to address these issues [6]. The belief that communication serves as the key to fostering an open society is held by SignDict. In order to enhance the integration of deaf communities, a living dictionary was established, and participation is encouraged from everyone in the development of a sign language dictionary, without the requirement of technical knowledge. Notably, both the SignDict and SignOn projects belong to a European country [7].
The most widely used sign languages, including American Sign Language (ASL) and British Sign Language (BSL), employ manual alphabets. Both of these classification criteria for sign languages are significant, although they are not written in the same language. The key distinction between ASL and BSL is that BSL provides detailed descriptions of the letter form, orientation, and placement in the hand, whereas ASL provides only shape and orientation [8]. There is no officially recognized or recorded version of Kurdish Sign Language (KuSL) in the nations populated by Kurds. In an effort to standardize and promote KuSL among the deaf community and others who are interested, Kurdish SL has been developed as a standard for Kurdish-script sign language based on ASL. This work, which relies on machine-learning methods, yielded a KuSL specification.
As a consequence of encouraging findings in real applications, the Convolutional Neural Network (CNN) method has recently acquired traction in the field of image categorization. As a result, many cutting-edge CNN-based sign language interpreters have been developed [9,10]. CNNs are helpful in classifying hand forms in SLR, but they cannot segment hands without further training or tools [11]. CNN’s superior ability to learn features from raw data input makes it the best machine-learning model for image identification. As a type of deep neural network, CNN was first developed for processing data in a single dimension [12]. Because of its high flexibility in parameter adjustment, growing effectiveness, and minimal difficulty, 2D CNN has achieved widespread use in a large range of fields, from image classification to face recognition and natural language processing [13].
Within the present context of gesture recognition, conventional algorithms such as the neural network approach and the hidden Markov model have significant limitations, including their considerable computational complexity and extended training times. In order to address these difficulties, the authors in [14] presented the Support Vector Machine (SVM) algorithm as an innovative method for gesture identification. The SVM algorithm is suggested as an appropriate alternative to current approaches to reduce the computational load and accelerate the relevant training procedures. With the goal of reducing the influence of external factors, including both human and environmental variables, on dynamic gestures of the same category, and improving the accuracy and reliability of the algorithm’s recognition, it is crucial to properly process the initial pattern of the gesture to be recognized, which is obtained through a camera or video file.
A categorization of KuSL using the ASL alphabet and the ArSL2018 dataset is proposed in this article as a potential benchmark for the Kurdish Sign Language Recognition Standard (KuSLRS). The Kurdish alphabet is based on Arabic script; hence, a 2D-CNN architecture to categorize the Kurdish letters is proposed.
1.1. Objectives
This study aims to bridge the communication gap between the hearing world and the deaf Kurdish community by developing a real-time, person-independent KuSL recognition system. Our model surpasses existing projects such as SignOn and SignDict, which focus on European languages and offline recognition, by achieving 98.80% real-time accuracy in translating handshapes into Kurdish-Arabic script letters. This accuracy can be maintained even under challenging lighting and hand poses thanks to our robust system design. Furthermore, we present the first comprehensive and publicly available KuSL dataset, which will facilitate further research and development. Importantly, real-time language translation empowers deaf individuals by providing access to written information and communication, thereby fostering greater independence and social participation. Our groundbreaking approach is evaluated in this study through field trials and user feedback, ensuring its effectiveness in bridging the communication gap and positively impacting the Kurdish deaf community.
1.2. Related Work
Most studies in the field used one of three methods: computer vision, smart gloves with sensors, or hybrid systems that integrate the two. In SLR, facial expressions are crucial, but the initial algorithm does not take them into account. In contrast to computer-vision systems, technologies based on gloves can capture the complete gesture, including the user’s movement [15,16].
Numerous studies on variant SL, such as ASL [17,18,19,20,21,22,23,24,25], BSL [26], Arabic SL [1,27,28], Turkish SL [29,30], Persian SL [31,32,33,34], Indian SL [35,36], and others, have been carried out in recent years. To the best of our knowledge, the only accessible studies focused on KuSL and consisted of 12 classes [37], 10 classes [38], and 84 classes [39].
The recognition of Kurdish signs in KuSL was improved with the help of the suggested model. There are now 71,400 samples across 34 alphabet signs for Kurdish letters due to the consolidation of the ASL and ArSL2018 databases. The three primary challenges in SLR are the large number of classes, models, and machine-learning techniques.
It is important to distinguish between static and dynamic gestures. To illustrate these procedures, one can utilize finger spelling. The architecture controls the data collection method, feature extraction, and data categorization, whether the system is working with static language signals expressed in a single picture or in dynamic language signs shown in a series of images. A global characteristic statement [39] works well for static indicators since they are not temporal series.
Addressing static sign language translation, a PhotoBiological Filter Classifier (PhBFC) was introduced for enhanced accuracy. Utilizing a contrast enhancement algorithm inspired by retinal photoreceptor cells, effective communication was supported between agents and operators and within hearing-impaired communities. Accuracy was maintained through integration into diverse devices and applications, with V3 achieving 91.1% accuracy compared to V1’s 63.4%, while an average of 55.5 Frames Per Second (FPS) was sustained across models. The use of photoreceptor cells improved accuracy without affecting processing time [40].
Wavelet transforms and Neural Network (NN) models were used in the Persian Sign Language (PSL) system to decipher static motions representing the letters of the Persian alphabet, as outlined in [31]. Digital cameras were used to take the necessary images, which were then cropped, scaled, and transformed into grayscale in preparation for use with the specified alphabets; 94.06% accuracy was achieved in the categorization process. Identification on a training dataset of 51 dynamic signals was implemented by utilizing LM recordings of hand motions and Kinect recordings of the signer’s face. The research achieved 96.05% and 94.27% accuracy for one- and two-handed motions, respectively, by using a number of different categorization methods [41]. As indicated in [21,23], some studies used a hybrid approach, employing both static and dynamic sign recognition.
Several methods exist for gathering data, such as utilizing a digital camera, Kinect sensor, continuous video content, or already publicly accessible datasets. It was suggested that a Kinect sensor be used in a two-step Hand Posture Recognition (HPR) process for SLR. At the outset, we demonstrated a powerful technique for locating and following hands. In the second stage, Deep Neural Networks (DNNs) were employed to learn features from effort-invariant hand gesture photos automatically, and the resulting identification accuracy was 98.12% [18].
To identify the 32 letters of the PSL alphabet, a novel hand gesture recognition technique was proposed. This method relies on the Bayes rule, a single Gaussian model, and the YCbCr color space. In this method, hand locations are identified even in complex environments with uneven illumination. The technology was tested using 480 USB webcam images of PSL postures. The total accuracy rating was 95.62% [33]. Furthermore, a technique was presented for identifying Kurdish sign language via the use of dynamic hand movements. This system makes use of two feature collections built from models of human hands. A feature-selection procedure was constructed, and then a classifier based on an Artificial Neural Network (ANN) was used to increase productivity. The results show that the success rate for identifying hand motions was quite high at 98% [38].
Furthermore, video data were processed via a Restricted Boltzmann Machine (RBM) to decipher hand-sign languages. After the original input image was segmented into five smaller photos, the convolutional neural network was trained to detect the hand movements. The Center for Vision at the University of Surrey compiled a dataset on handwriting recognition [22] using data from Massey University, New York University, and the ASL Fingerspelling Dataset (2012).
A hand-shape recognition system was introduced using Manus Prime X data gloves for nonverbal communication in VR. This system encompasses data acquisition, preprocessing, and classification, with analysis exploring the impact of outlier detection, feature selection, and artificial data augmentation. Up to 93.28% accuracy for 56 hand shapes and up to 95.55% for a reduced set of 27 were achieved, demonstrating the system’s effectiveness [42].
Researchers in this area continue to improve techniques for reading fingerspelling, recognizing gestures, and interpreting emotions. However, progress is being made every day in alphabetic recognition, which serves as the foundation for systems that recognize words and translate sentences. Twenty-four images representing the American Sign Language alphabet were used to train a Deep Neural Network (DNN) model, which was subsequently fine-tuned using 5200 pictures of computer-generated signs. A bidirectional LSTM neural network was then used to improve the system, achieving 98.07% efficiency in training using a vocabulary of 370 words. For observing live camera feeds, the SLR response to the current picture is summarized in three lines of text: a phrase produced by the gesture sequence to word converter, the resolved sentence received from the Word Spelling Correction (WSC) module, and the image itself [9]. These are all GUI elements used in the existing American Sign Language Alphabet Translator (ASLAT).
Using a wearable inertial motion capture system, a dataset of 300 commonly used American Sign Language sentences was collected from three volunteers. Deep-learning models achieved 99.07% word-level and 97.34% sentence-level accuracy in recognition. The translation network, based on an encoder–decoder model with global attention, yielded a 16.63% word-error rate. This method shows potential in recognizing diverse sign language sentences with reliable inertial data [43].
The possibility of using a fingerspelled version of the British Sign Language (BSL) alphabet for automatic word recognition was explored. Both hands were used for recognition when using BSL. This is a challenging task in comparison to learning other sign languages. The research achieved 98.9% accuracy in word recognition [26] using a collection of 1000 webcam images.
Scale Invariant Feature Transform (SIFT), Sped-Up Robust Features (SURF), and a novel approach called a based gesture descriptor have been suggested for use in the detection of hand movements, with the latter two being applied to the Kurdish Sign Language (KuSL) presented here. The new method outperformed the previous two algorithms in identifying Kurdish hand signals, with a recognition accuracy of 67% compared to 42% for the first two algorithms. Although there are only 36 letters in the Kurdish-Arabic alphabet, variances in imaging conditions such as location, illumination, and occlusion make it challenging to construct an efficient Kurdish sign language identification system.
While existing projects such as SignOn (focused on European Sign Languages) and SignDict (investigating lexicon translation) achieved noteworthy accuracy (95.2% and 97.8%, respectively) [7,44], their applicability to Kurdish Sign Language (KuSL) and real-time communication settings is limited. Our model addresses these gaps by specifically targeting KuSL, a previously understudied language, and prioritizing real-time performance with exceptional 99.05% training accuracy. This focus on real-world accessibility and language specificity offers unique advantages for the Kurdish deaf community, surpassing the capabilities of prior research in its potential to bridge the communication gap.
1.3. Research Contributions
The proposed approach uses convolutional neural networks to identify KuSL, with the model instantly converting real-time images into Kurdish letters.
The main contributions of the present study are as follows:
- The suggested method is the first method for accurately reading Kurdish script using a static hand-shaped symbol.
- We create a completely novel, fully labeled dataset to use with KuSL. The hand form identification dataset collected during ASL and ArSL2018 will be made freely accessible to the scientific community.
- In this method, one-handed forms alone are sufficient for alphabet identification; motion signals are not required.
- This method offers CNN-based real-time KuSL system generation with a high accuracy for several user types.
The remaining parts of this work are as follows. Materials and methods are described in Section 2. The findings and subsequent discussion are presented in Section 3. In Section 4, we provide our final thoughts and outline our plans for the future.
Karwan Mahdi Hama Rawf
Ayub Othman Abdulrahman
Aree Ali Mohammed
The deaf society supports Sign Language Recognition (SLR) since it is used to educate individuals in communication, education, and socialization. In this study, the results of using the modified Convolutional Neural Network (CNN) technique to develop a model for real-time Kurdish sign recognition are presented. Recognizing the Kurdish alphabet is the primary focus of this investigation. Using a variety of activation functions over several iterations, the model was trained and then used to make predictions on the KuSL2023 dataset. There are a total of 71,400 pictures in the dataset, drawn from two separate sources, representing the 34 sign languages and alphabets used by the Kurds. A large collection of real user images is used to evaluate the accuracy of the suggested strategy. A novel Kurdish Sign Language (KuSL) model for classification is presented in this research. Furthermore, the hand region must be identified in a picture with a complex backdrop, including lighting, ambience, and image color changes of varying intensities. Using a genuine public dataset, real-time classification, and personal independence while maintaining high classification accuracy, the proposed technique is an improvement over previous research on KuSL detection. The collected findings demonstrate that the performance of the proposed system offers improvements, with an average training accuracy of 99.05% for both classification and prediction models. Compared to earlier research on KuSL, these outcomes indicate very strong performance.
2.8