This 15B Model Generates Talking Videos With Synced Audio From Text

Wait 5 sec.

daVinci-MagiHuman is a 15B unified model that generates synchronized video and speech from text prompts with fast, high-quality results.