Ever wondered what your favorite podcast would sound like in Mandarin or Spanish? Well, Spotify is testing a new AI-powered feature that will do exactly that. Voice Translation is a new feature that is being rolled out today (September 25) that will enable you listen to certain episodes of podcasts in a different language, but in the speaker’s own voice… or a facsimile of it, at least.
The tool, which was developed by Spotify with the help of OpenAI‘s automatic speech recognition (ASR) system Whisper, uses a speech-to-text generative AI model to translate the audio files and a voice replication model to match the original speaker’s style.
The first presenters to be part of this new feature include Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons and Steven Bartlett. Not every episode of their respective podcasts will be available in multiple language right away though. Instead keep an eye out for the ‘Interview with Yuval Noah Harari’ episode on the Lex Fridman Podcast, ‘Kristen Bell, by the grace of god, returns’ on Armchair Expert, and ‘Interview with Dr. Mindy Pelz’ on The Diary of a CEO with Steven Bartlett, which will are all now available in Spanish.
Spotify says that more episodes will be available in the coming days and weeks, with French and German translations set to come next. You’ll be able to find these in the Now Playing View on your mobile or desktop app, with more voice-translated episodes set to be added to a dedicated Voice Translations hub.
Opinion: a smarter use of AI
The best music streaming services were quick to jump on the generative AI gold rush following the hype around OpenAI’s ChatGPT. While they were already using machine learning to identify patterns and trends in your music listening habits to better recommend new songs (think: your Discover Weekly playlist), there have been a few new ways that the tech is being deployed.
Spotify’s AI DJ, which uses an AI-generated voice to recommend new tracks, is just one of them. There’s also Universal Music’s deal with Endel to make ambient audio such as forest noises and running water using AI, as well as countless music generators, including from the likes of Meta and Google. But undoubtedly the scariest was the idea of using generative AI to make podcasts from scratch.
Several generative AI podcasts surfaced off the back of experimentation in the area, including The Joe Rogan AI Experience and the Hackers News Recap to name a couple. Aside from concerns around copyright and privacy, the biggest backlash with these was the lack of a lively conversation, which the best podcasts are built on.
It’s most likely why they didn’t really take off, but the idea to bring in generative AI to translate podcasts is exactly the type of use case I can get behind. Machine learning is a tool, after all, so seeing it used to make interesting shows more widely available is a great use, assuming the pace and liveliness of conversation really does translate. Now I need to find all the foreign language podcasts that I’ve been missing out on and get them in English.