
Cloudflare is pushing deeper into AI with new partner integrations that target some of the most in-demand use cases: image generation and real-time voice. The company announced today that it is bringing Leonardo’s image models and Deepgram’s audio models to its Workers AI platform.
Workers AI was built around the idea that models would get faster and smaller, and Cloudflare designed its global GPU-backed infrastructure to handle those workloads. By adding these new closed-source partner models, Cloudflare is leaning into areas where speed and low latency matter most.
For image generation, Cloudflare is introducing two models from Leonardo.Ai: Phoenix 1.0 and Lucid Origin. Phoenix is a custom-trained model designed for prompt coherence and text rendering. In Cloudflare’s tests, a 1024×1024 image took just under five seconds to generate. Lucid Origin focuses on photorealism, producing similarly fast results. These models can be called directly via Cloudflare’s AI API, making them easy to integrate into applications for gaming, web design, or media creation.
On the voice side, Cloudflare is partnering with Deepgram. The Nova 3 model delivers high-accuracy speech-to-text, while Aura 1 generates natural, expressive text-to-speech output. A newer Aura 2 model is coming soon. Developers can tap into these through Workers AI and even maintain persistent connections via WebSocket, which is key for real-time voice agents and interactive applications.
Both Leonardo and Deepgram praised Cloudflare’s global network and infrastructure for enabling these experiences. By hosting inference close to users, Cloudflare claims it can deliver low-latency AI that’s difficult to replicate without its distributed footprint.
The bigger story, however, is how Cloudflare is positioning Workers AI as more than just a model hosting service. The company wants developers to see its entire suite as a platform for building AI applications. That means using Workers for logic, R2 for storage, Images for media handling, and Realtime tools for orchestration, all tied together with AI inference at the edge.