Magika 1.0 goes stable as Google rebuilds its file detection tool in Rust

Google is giving its Magika file detection system a serious upgrade, and this one is actually pretty interesting for developers, security folks, and anyone who has ever tried to guess what a mystery file is by its extension.

Magika 1.0 is now officially stable, faster, and rebuilt in Rust. The project originally launched last year as an AI-powered way to identify file types based on content instead of just filenames. Since then, it has quietly become popular in open source circles, apparently racking up more than a million monthly downloads.

This new version doubles down on its core promise. Magika now supports over 200 file types, which is a huge jump from the roughly 100 the old version handled. The expanded support means it can differentiate between file formats that tend to look similar at a glance. If you have ever worked with JSONL vs JSON, TSV vs CSV, C vs C++, or JavaScript vs TypeScript, you know how those can blur together. Magika claims it can tell them apart correctly even when the file content is short.

What stands out is how Google ended up training this model. Their dataset ballooned to around 3TB of code and sample files, which is obviously not something you just store on your laptop. To help process this at scale, Google used its SedPack dataset system, which streams and decompresses training data on the fly to avoid I/O bottlenecks.

They also used their Gemini AI model to generate synthetic training data where real samples were too scarce. In other words, they had AI convert code from one language to another so Magika could learn the differences. It is a little ironic that AI is training AI so that AI can more accurately identify AI-generated content. That is basically the state of the industry in 2025.

The other major change is under the hood. Magika has been rewritten entirely in Rust. The new engine is built for speed and memory safety, which makes sense considering file scanning tools often end up being used in security workflows where reliability matters. Google says the new native command-line tool can scan around a thousand files per second on an M4-based MacBook Pro on a single core. With more cores, it scales easily. It uses the ONNX Runtime for inference and the Tokio async runtime to keep everything moving without blocking.

This is not just a Rust binary dropped into GitHub. The team also refreshed the Python and TypeScript bindings. That means both developers who prefer scripting and developers building full applications should find it easier to plug Magika into existing tooling. If you want to try it, installation is frictionless on Linux and macOS using a one-line curl script. Windows users get a PowerShell command. And if you use pipx, you can install the Rust client directly from the Python package.

The release is also an interesting sign of where Google is putting its energy in 2025. Rust, ONNX, and real developer tooling improvements are not exactly flashy, but they are what developers actually need. It also suggests that Google sees a future where file type identification needs to be smarter than plain MIME sniffing. Considering how many machine learning, web, and programming ecosystems keep inventing new formats, this feels necessary.

Magika is open source, and Google encourages contributions. If you have a weird file format from some forgotten toolchain and enough sample files, they want you to submit support for it. Or if you just want to poke around, there is a web demo and full documentation available.

Avatar of Brian Fagioli
Written by

Brian Fagioli

Technology journalist and founder of NERDS.xyz

Brian Fagioli is a technology journalist and founder of NERDS.xyz. A former BetaNews writer, he has spent over a decade covering Linux, hardware, software, cybersecurity, and AI with a no nonsense approach for real nerds.

Leave a Comment