Speech-to-Speech Translation with Lip-Synchronization

Soujanya B K, Abhishek U Gaonkar, Chandan N, Sumeet Chavan

Published Paper

Soujanya B K, Abhishek U Gaonkar, Chandan N, Sumeet Chavan
India
Page: 1109-1118
Published on: 2024 March

This innovative project introduces a comprehensive system for multilingual video dubbing, with a primary focus on enhancing accessibility for individuals with limited literacy seeking content in their native language. The workflow commences with a Speech-to-Text module, meticulously transcribing English speech into written text, serving as a foundational step to bridge the auditory and textual components. The Neural Machine Translation (NMT) module then takes precedence, utilizing advanced neural network architectures to translate transcribed English text into the desired target language. Going beyond traditional translation, the system aims to capture linguistic nuances and cultural sensitivities, ensuring anaccuratere presentation of the original content. Advancing through the workflow, the Text-to-Speech modulere fines the translated content, optimizing it for synthesis into the desired language to provide a natural and expressive spoken output, thereby enhancing overall accessibility. Differentiating our project is the incorporation of the Lip GAN Visual Module, leveraging advanced techniques like Generative Adversarial Networks (GANs)to generate lifelike lip movements synchronized seamlessly with the translated speech. This visual enhancement introduces a unique and immersive aspect to the viewing experience, catering to a diverse audience, particularly those with limited literacy.

PDF