WAV2vec-u: Facebook’s AI model that employs unsupervised ML

May 25, 2021 | Vanshika Kaushik

WAV2vec-u: Facebook’s AI model that employs unsupervised ML title banner

Facebook, one of the most popular social media application with 2.6 billion users worldwide, is also a leading developer and an active user of technology. Facebook uses its own AI based translations that helps users from different worldwide regions, to convert news feed, and facebook stories in their own languages.

Taking its technology one step further, facebook has trained an AI model that will not require transcribed data. Facebook will use this AI model for building the speech recognition system.

Facebook’s unsupervised speech recognition model Wav-2vcu will be fed with unknown data with no previously defined datasets. The system will teach itself to classify data.

Wav-2vcu will learn from recorded speech and text, it will eradicate the need for transcriptions. This model will use a clustering method.

(Recommended Blog: What is Facebook TransCoder AI ? )

To train this Facebook developed a generative adversarial network(GAN), that consisted of a generator and a discriminator. The generator will take audio segments to predict phoneme.

The discriminator will help the AI model to distinguish between speech recognition output of the generator and real text examples from the “phenomized” texts. GAN’s translations improved with the feedback of the generator.

Companies like Microsoft and Dataiku are also using unsupervised machine learning.

Supervised Machine Learning

Supervised machine learning, widely employed for training ML models using “labelled data”. The machine displays output on the basis of data entered (input). Supervised ML can also be defined as training, the use of labelled datasets for training algorithms to classify or predict data.

Unsupervised Machine Learning

In unsupervised ML models are not supervised using training dataset. The model itself traces the hidden patterns from the data.

Unsupervised learning is similar to the functioning of a human brain. The ML model traces the structure of dataset and group data accordingly in unsupervised ML.

Facebook will soon release the code for Wav2vec-U. It will help other developers to build speech recognition systems using unlabeled audio recordings and unlabeled data.

According to VentureBeat , “AI technologies like speech recognition should not benefit only people who are fluent in one of the world’s most widely spoken languages. Reducing our dependence on annotated data is an important part of expanding access to these tools,” Facebook wrote in a blog post. “People learn many speech-related skills just by listening to others around them. This suggests that there is a better way to train speech recognition models, one that does not require large amounts of labeled data.”

Tags #Artificial intelligence