Curated By: Shaurya Sharma
Last Updated: August 23, 2023, 08:23 IST
Menlo Park, California, USA
Meta has unveiled a new AI model called SeamlessM4T, which is designed to help users translate text and speech more efficiently across different languages.
The company says SeamlessM4T is the first all-in-one multimodal and multilingual AI translation model. It can recognize speech in nearly 100 languages, and translate speech to text in nearly 100 input and output languages. It also supports text-to-text translation, text-to-speech translation, and even speech-to-speech translation.
Meta is making SeamlessM4T available publicly with a research license so that researchers can build further on the already existing work.
“Building a universal language translator, like the fictional Babel Fish in The Hitchhiker’s Guide to the Galaxy, is challenging because existing speech-to-speech and speech-to-text systems only cover a small fraction of the world’s languages. But we believe the work we’re announcing today is a significant step forward in this journey,” Meta notes.
It also said that when comparing this model to other “approaches using separate models, SeamlessM4T’s single system approach reduces errors and delays, increasing the efficiency and quality of the translation process. This enables people who speak different languages to communicate with each other more effectively.”
Meta also acknowledged that the creation of this model is all towards creating a “universal translator.” And, that the current model draws inspiration from some of the company’s recent models like No Language Left Behind and Massively Multilingual Speech.
“In the future, we want to explore how this foundational model can enable new communication capabilities—ultimately bringing us closer to a world where everyone can be understood.,” Meta said.
In related news, Meta also recently unveiled its AudiCraft AI tool which lets users create original audio tracks using text-based prompts. The tool is divided into three models: AudioGen, MusicGen, and EnCodec. AudioGen generates audio from text prompts based on public sound effects, while MusicGen does the same thing but with music licensed by Meta. EnCodec decoder allows for higher quality music generation with fewer artefacts.