Published Paper


Lip Reading Using Connectionist Temporal Classification

Mala B M; Meghana K; Adhira M Nair; Sparsha B; Lekhana M
School of Computer Science Reva University Bengaluru, India
Page: 326-336
Published on: 2024 June

Abstract

Lip-reading is the responsibility of decoding text from the movement of a speaker’s mouth. Lip-reading system takes video without audio as an input of a speaker speaking some word or phrase and provides the anticipated word or phrase as output. This is exceedingly beneficial for hearing impaired individuals to understand the movement of the mouth of a speaker who do not know sign language in the physical world with a lot of noise pollution. Conventional methods have concentrated mostly on bulk preprocessing. Regardless of showing immense potential, application of deep learning algorithms has been minimal in this field. Here we expose a Convolutional Neural Network (CNN) model to anticipate words from video without the audio. Lip-reading system also uses an attention-based Long Short-Term Memory (LSTM), Connectionist Temporal Classification (CTC) along with Convolution neural network (CNN). The trained lip-reading model is evaluated based on the accuracy to anticipate words. Moreover, we examine challenges and limitations associated with deep learning-based lip reading, including data scarcity, variations in lighting conditions, speaker-dependent variability, and occlusions. To address these limitations and improve system performance, we propose the adoption of ensemble learning techniques in future iterations. This research contributes to the advancement of lip-reading technology, particularly beneficial for hearing-impaired individuals navigating noisy environments where sign language is impractical. By harnessing deep learning methods, we aim to enhance accuracy and efficiency, thereby improving accessibility and communication for diverse populations.

 

PDF