Skip to main content

How fal.ai offers the fastest generative ai in the world

· 4 min read
Katie Schilling

fal.ai’s team set an ambitious goal: host the fastest diffusion inference endpoints in the world without passing the bill onto their users. Their platform needed to remain affordable for individual developers, all while ingesting 10s of TBs in mere hours, storing 100+ TBs of data around the globe, and offering real time responses.

Tigris loves fal.ai, generated by FLUX

Inference faster than you can type

Obsessed with optimization, fal.ai knows that every second counts on their GPUs. They upload output images in background threads so the only GPU time charged is for actual inference. Their researchers constantly test their production models against the state-of-the-art (SOTA) to find the best performing architectures for task precision, reliability, and reduced generation time. Every extra millisecond is shaved off, making fal.ai an order of magnitude more performant than their competitors for many generative ai tasks.

Quote

fal.ai has built custom infrastructure and optimized the model inference to make sure these models are served to the end user as fast as possible. fal.ai has a globally distributed network of GPUs to make sure the inference happens as close to the user as possible. We do very little hops from between the user and the GPU. —Burkay Gur, Co-founder fal.ai

Extreme performance

Bringing the cutting-edge into industry requires translating both academic research and the latest community-driven workflows, into live, performant systems. And they mean performance— completely saturated 10Gb links, 1GB/s+ writes, squeezing every last bit of juice from a global fleet of GPUs across many clouds. Every step from training to inference pipelines is carefully optimized and purpose built. Each network hop is scrutinized.

fal.ai tried other providers with no egress fees, but none of them met their reliability and performance needs. Low throughput, sluggish downloads, and intermittent 500 errors made it impossible to guarantee their diffusion endpoints could process requests in real time. After partnering with Tigris, fal.ai didn’t need to choose between performance, reliability, and cost.

Quote

have been using @TigrisData at @fal for the last 2 months. ingested 10s of TBs of data in mere hours while storing 100 TB+ without any hassle. much much more reliable than anything else we have used. and also MUCH FASTER. i am impressed honestly. thanks @martin_casado for rec —Batuhan Taskaya, Head of Engineering at fal.ai

A hub for developers

Image generation, video generation, upscaling images, speech-to-text, text-to-speech, all sorts of media related functionality— keeping up with the latest and greatest models and tools is a lot for any developer. And it’s often unclear which models can keep up with a production deployment. fal.ai is a unified hub for integrating these models as reliable utilities. They’ve already selected the best model, and delivered it in the fastest way possible.

Quote

You wanna go to production? Let's go to production. --Burkay Gur, Co-founder fal.ai

And when we say “latest and greatest,” we mean it. fal.ai offers the largest SOTA open source text-to-image model to date, Flux. Built by the original Stable Diffusion team, fal.ai was their first choice for digging into the model and optimizing it to run on a real-world production grade endpoint.

Dev friendly features like one-click fine tuning make it unbelievably easy to customize models to your users without sacrificing on inference speeds. fal.ai also exposes raw WebSockets to make it easy to reduce overall latency and simplify the developer experience. Getting whichever GPU is closest and cheapest has never been easier.

Unlimited horizontal scaling at an 85% discount with Tigris

Sending data around so many GPUs across so many clouds could lead to a dizzyingly high egress bill. Other zero egress cost storage providers struggle to meet the stringent performance requirements of modern platforms. Finding a storage solution built for an extremely performant global deployment was essential to keeping fal.ai accessible to a broad range of developers.

Not only was Tigris “just crazy fast,” it saved fal.ai 85% on their object storage costs as compared to other clouds with egress fees. With cost no longer limiting their object storage, fal.ai unlocked limitless horizontal scale.