Microsoft previews MAI-Voice-1 speech model and MAI-1 foundation model

Illustration of a woman using a laptop with a virtual AI assistant speaking through the screen, symbolizing Microsoft’s MAI-Voice-1 speech model and AI technology.

Microsoft has introduced two new models from its Microsoft AI (MAI) division. The company says the focus is on creating AI tools that people can use and trust.

The first release is MAI-Voice-1. This speech model is designed to produce audio that sounds natural and expressive, rather than robotic.

It is already built into Copilot Daily and Podcasts. Microsoft is also letting people try it directly in Copilot Labs, where demos show off features like storytelling and guided meditation.

What makes MAI-Voice-1 stand out is speed. It can generate a full minute of audio in under a second on a single GPU, which puts it among the fastest speech systems available today. Microsoft wants people to test it in creative ways, from interactive stories to custom voice tracks for projects.

The second release is MAI-1-preview, Microsoft’s first foundation model trained entirely in-house. Unlike past approaches that leaned on partners, this one was built from the ground up.

MAI-1-preview is a mixture-of-experts model trained on about 15,000 NVIDIA H100 GPUs. Public testing has already started on LMArena, a platform where the community can evaluate new models. Microsoft is also opening up API access for trusted testers.

The plan is to roll this model into certain text interactions inside Copilot over the next several weeks. That feedback will help the team refine it further.

Rather than trying to make one giant model that does everything, Microsoft is aiming for a set of specialized systems tuned for different jobs. That work is supported by its new GB200 cluster, which is already operational.

MAI leadership describes the lab as lean but ambitious. The group is pitching itself as a place where top researchers and engineers can move quickly and see their work end up in products used by millions, if not billions.

For now, MAI-Voice-1 and MAI-1-preview are just the first signs of what is to come. Voice is being positioned as the next big interface for AI, while MAI-1-preview shows Microsoft is serious about competing in foundation models. Both are already tied into Copilot, which gives them a direct path into people’s daily lives.

The real question is whether users actually want another foundation model at all. Many people are still figuring out how to use the ones that already exist. Microsoft may have the technical muscle to train massive systems, but proving they are necessary and genuinely helpful could be the bigger challenge.

Author

  • Brian Fagioli, journalist at NERDS.xyz

    Brian Fagioli is a technology journalist and founder of NERDS.xyz. Known for covering Linux, open source software, AI, and cybersecurity, he delivers no-nonsense tech news for real nerds.

Leave a Comment