This AI tool creates singing, rapping, talking avatars from a single image and even the Mona Lisa isn’t safe from spitting bars

Remember that late-night talk show bit where an image of a political figure is shown with someone else’s mouth superimposed on top, in order to make them say dubious things? It always looked a little ropey, but that was part of the effect. Well, this new AI tool also takes still images of human subjects and animates the mouth and head movements, but this time the effect is surprisingly, almost worryingly convincing.

The tool is called EMO: Emote Portrait Alive, and it’s been developed by several researchers from the Institute for Intelligent Computing, part of the Alibaba Group. The tool takes a single reference image, extracts generated motion frames, and then combines them with vocal audio through a complex diffusion process in which the facial region is integrated with multi-frame noise samples and then de-noised while adding generated imagery to synch with the audio, eventually generating a video of the subject not only lip-synching, but also emoting various facial expressions and head poses.

The technology is demonstrated using sample images of various figures ranging from real-life celebrities, to AI generated people, to the Mona Lisa, while the vocal audio used includes a Dua Lipa track, pre-recorded interview clips, and Shakespearian monologues. After the process has been applied the generated avatar appears to have come to life, mouthing and moving to the chosen audio.

The effect is surprisingly accurate, although it has to be said, far from perfect. “Buh” sounds sometimes appear to come from open mouths rather than closed lips, and the occasional syllable appears from clenched teeth, as if the avatar is resisting the AI’s insistence on bringing them to life to sing and perform for the internet.

More stories

This mecha passion project refreshes the genre with technical innovations and a whole lotta’ sparks
February 24, 2024

Razer Blade 14 (2024) review
April 3, 2024

Star Citizen truly outdoes itself with a $48,000 bundle for its most loyal whales
January 4, 2024

Magic: The Gathering’s next main set is going full Watership Down
July 12, 2024

This is mind blowing.This AI can make single image sing, talk, and rap from any audio file expressively! 🤯Introducing EMO: Emote Portrait Alive by Alibaba.10 wild examples: 🧵👇1. AI Lady from Sora singing Dua Lipa pic.twitter.com/CWFJF9vy1MFebruary 28, 2024

Still, it’s a remarkable effect, and one that’s likely to pass without notice from a casual observer unless they were told specifically to watch out for mouth movements and timing.

Even more impressive is a later demonstration of what the company refers to as “cross-actor performance”. A clip shows Joaquin Phoenix in full make-up as the Joker, except this time with the audio of Heath Ledger’s interpretation of the character from The Dark Knight, including a reasonable approximation of Ledger’s trademark swallowing and lip smacking in the role.

While the technology is undoubtedly impressive, it’s likely to do little to dissuade the creeping notion that AI deepfake content, and all the nefarious purposes it can be potentially used for, is progressing at a remarkable rate.

While these videos make for excellent tech demonstrations, they are reminders that the difference between what we presume is real and what is computer generated is rapidly becoming harder to spot as image and video generation technology matures. AI tools can sometimes demonstrate a terrifying ability to churn out generated content at an incredible rate and with increasing complexity, and that has some troubling implications. Although perhaps that’s just me being a big old worrywart.

Will it not be long, I wonder, before our holiday snaps can be grabbed from our long defunct Facebook pages, to be turned by AI tools into videos of us mouthing songs we never sang? At least, that’s my excuse.

No, I did not drunkenly attempt karaoke in Cyprus. It’s an AI-enhanced fake, that one, I promise.

This AI tool creates singing, rapping, talking avatars from a single image and even the Mona Lisa isn’t safe from spitting bars

This mecha passion project refreshes the genre with technical innovations and a whole lotta’ sparks

Razer Blade 14 (2024) review

Star Citizen truly outdoes itself with a $48,000 bundle for its most loyal whales

Magic: The Gathering’s next main set is going full Watership Down

This brilliant boomer shooter inspired by Blood gets a whole new campaign and a free update later this month

Today’s Wordle clues, hints and answer for September 6 (#1540)

Now that the wait for Silksong is over, gamers can get back to their true calling: asking for Half-Life 3 in the chat

Cairn: the climb of a lifetime starts January 29 2026

SOL Shogunate, a samurai space opera, announced for PS5

Romancing SaGa creator discusses the fantasy RPG’s legacy, music, art, and more

PlayStation Store: November 2025’s top downloads

Gran Turismo 7 Spec III & Power Pack DLC available December 4

PlayStation Partner Awards 2025 Japan/Asia winners announced