Microsoft has announced a new small language model called Mu. It is designed to run directly on Windows 11 Copilot Plus PCs, powering AI experiences without relying on the cloud.
Mu is already live in the Settings app for users in the Windows Insider Dev Channel. If you’ve used natural language to change settings on a Copilot Plus device, you’ve already used it.
The model runs entirely on the Neural Processing Unit (NPU). That means it delivers fast responses while preserving privacy and saving power. According to Microsoft, Mu responds at more than 100 tokens per second and gets the first token out in under half a second.
Mu uses an encoder-decoder transformer design. The encoder turns the input into a representation, and the decoder generates an output from it. This structure makes the model faster and more efficient, especially when compared to decoder-only models.
Microsoft claims that Mu is about 47 percent faster for first-token response and nearly five times faster at decoding when tested on Qualcomm NPUs. That kind of speed matters for real-time system tasks like adjusting settings.
To make the most of NPU hardware, Microsoft optimized Mu’s layer dimensions and matrix operations. Parameters were arranged to match how chips handle data internally. The model even shares some weights between input and output layers to save memory.
Quantization also plays a role. The team applied post-training quantization to compress the model from floating point to 8-bit and 16-bit integer formats. This reduced the memory footprint and boosted performance without sacrificing accuracy.
Mu is small. It has just 330 million parameters. But it was trained using advanced methods. First, Microsoft pre-trained the model on billions of educational tokens. Then it used knowledge distillation from its larger Phi models. The final step was fine-tuning Mu on specific tasks.
One of those tasks is the Windows Settings agent. This lets users type things like “Turn on dark mode” and get direct results. But Mu didn’t get it right out of the box. Microsoft had to scale training to 3.6 million samples and expand the number of settings it understood from 50 to several hundred.
To handle vague queries like “Increase brightness,” Microsoft had to get clever. It prioritized commonly used settings and built systems that fall back to regular search when input is too short or unclear.
In benchmarks, Mu holds up well. On CodeXGlue, it slightly outperforms Phi. On SQUAD and the Settings task, it comes close despite being a tenth of the size.
Microsoft says Mu can generate more than 200 tokens per second on a Surface Laptop 7. That is fast enough for real-time use on the edge without cloud support.
For now, Mu is only available to Insiders. That might annoy regular users, especially since this is one of the more practical AI tools Microsoft has delivered. Hopefully, a wider release is not far off.