Microsoft launches 3 AI models for transcription, image, and speech generation

Wait 5 sec.

Through these three models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — Microsoft aims to expand its push into multimodal AI capabilities for developers. The models are also being integrated into Microsoft products, including Copilot, Bing, and PowerPoint, with enterprise adoption already underway