[Blog](/blog/.md)

<!-- -->

/

<!-- -->

[Customers](/blog/tags/customers/.md)

# How fal.ai offers the fastest generative ai in the world

Katie Schilling · September 18, 2024 ·

<!-- -->

5 min read

[![Katie Schilling](https://avatars.githubusercontent.com/u/8411213?s=400\&u=762567c65decb82e1433a7d1c42a2ffdcc59a125\&v=4)](https://www.linkedin.com/in/katieschilling)

[Katie Schilling](https://www.linkedin.com/in/katieschilling)

DevEx Enthusiast

![Tigris loves fal.ai, generated by FLUX](/blog/assets/images/falai-e2d6ec756f784cd514344d904081c2ab.jpg)

Quick Summary

**Fastest generative AI inference.** fal.ai is an order of magnitude more performant than competitors, with optimized models and a global GPU fleet.

**Massive scale.** Ingests 10s of TBs in hours and stores 100+ TBs of data globally, with saturated 10Gb links and 1GB/s+ writes.

**85% cost savings with Tigris.** Zero egress fees and extreme performance eliminated the tradeoff between cost and reliability, unlocking limitless horizontal scale.

[fal.ai's](https://fal.ai/) team set an ambitious goal: host the fastest diffusion inference endpoints in the world without passing the bill onto their users. Their platform needed to remain affordable for individual developers, all while ingesting 10s of TBs in mere hours, storing 100+ TBs of data around the globe, and offering [real time responses](https://fal.ai/docs/real-time).

<!-- -->

## Inference faster than you can type[​](#inference-faster-than-you-can-type "Direct link to Inference faster than you can type")

Obsessed with optimization, fal.ai knows that every second counts on their GPUs. They upload output images in background threads so the only GPU time charged is for actual inference. Their researchers constantly test their production models against the state-of-the-art (SOTA) to find the best performing architectures for task precision, reliability, and reduced generation time. Every extra millisecond is shaved off, making fal.ai an order of magnitude more performant than their competitors for many generative ai tasks.

![Burkay Gur](https://pbs.twimg.com/profile_images/1938279390737276928/G9oTncIl_400x400.jpg)

Burkay Gur

<!-- -->

<!-- -->

<!-- -->

— Co-founder @ fal.ai@

<!-- -->

burkaygur

fal.ai has built custom infrastructure and optimized the model inference to make sure these models are served to the end user as fast as possible. fal.ai has a globally distributed network of GPUs to make sure the inference happens as close to the user as possible. We do very little hops from between the user and the GPU.

## Extreme performance[​](#extreme-performance "Direct link to Extreme performance")

Bringing the cutting-edge into industry requires translating both academic research and the latest community-driven workflows, into live, performant systems. And they mean performance— completely saturated 10Gb links, 1GB/s+ writes, squeezing every last bit of juice from a global fleet of GPUs across many clouds. Every step from training to inference pipelines is carefully optimized and purpose built. Each network hop is scrutinized.

[fal.ai](http://Fal.ai) tried other providers with no egress fees, but none of them met their reliability and performance needs. Low throughput, sluggish downloads, and intermittent 500 errors made it impossible to guarantee their diffusion endpoints could process requests in real time. After partnering with Tigris, fal.ai didn’t need to choose between performance, reliability, and cost.

![Batuhan Taskaya](https://pbs.twimg.com/profile_images/1930721001660440576/rdDeDT5t_400x400.jpg)

Batuhan Taskaya

<!-- -->

<!-- -->

<!-- -->

— Head of Engineering @ fal.ai@

<!-- -->

isidentical

have been using [@TigrisData](https://x.com/TigrisData) at [@fal](https://x.com/fal) for the last 2 months. ingested 10s of TBs of data in mere hours while storing 100 TB+ without any hassle. much much more reliable than anything else we have used. and also MUCH FASTER. i am impressed honestly. thanks @martin\_casado for rec

May 20, 2024 at 2:59 p.m.

[Link to the source](https://x.com/isidentical/status/1792631256586338349)

## A hub for developers[​](#a-hub-for-developers "Direct link to A hub for developers")

Image generation, video generation, upscaling images, speech-to-text, text-to-speech, all sorts of media related functionality— keeping up with the latest and greatest models and tools is a lot for any developer. And it’s often unclear which models can keep up with a production deployment. fal.ai is a unified hub for integrating these models as reliable utilities. They’ve already selected the best model, and delivered it in the fastest way possible.

![Burkay Gur](https://pbs.twimg.com/profile_images/1938279390737276928/G9oTncIl_400x400.jpg)

Burkay Gur

<!-- -->

<!-- -->

<!-- -->

— Co-founder @ fal.ai@

<!-- -->

burkaygur

You wanna go to production? Let's go to production.

And when we say “latest and greatest,” we mean it. [fal.ai](http://fal.ai) offers the largest SOTA open source text-to-image model to date, [Flux](https://blog.fal.ai/flux-the-largest-open-sourced-text2img-model-now-available-on-fal/). Built by the original Stable Diffusion team, fal.ai was their first choice for digging into the model and optimizing it to run on a real-world production grade endpoint.

Dev friendly features like one-click fine tuning make it unbelievably easy to customize models to your users without sacrificing on inference speeds. [fal.ai](http://Fal.ai) also exposes raw WebSockets to make it easy to reduce overall latency and simplify the developer experience. Getting whichever GPU is closest and cheapest has never been easier.

## Unlimited horizontal scaling at an 85% discount with Tigris[​](#unlimited-horizontal-scaling-at-an-85-discount-with-tigris "Direct link to Unlimited horizontal scaling at an 85% discount with Tigris")

Sending data around so many GPUs across so many clouds could lead to a dizzyingly high egress bill. Other zero egress cost storage providers struggle to meet the stringent performance requirements of modern platforms. Finding a storage solution built for an extremely performant global deployment was essential to keeping fal.ai accessible to a broad range of developers.

**Not only was Tigris [“just crazy fast,”](https://x.com/isidentical/status/1817637355366613374?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1817637355366613374%7Ctwgr%5E8024e11d092aa48518bdab2d4fb51742457694f6%7Ctwcon%5Es1_c10\&ref_url=https%3A%2F%2Fpublish.twitter.com%2F%3Furl%3Dhttps%3A%2F%2Ftwitter.com%2Fisidentical%2Fstatus%2F1817637355366613374) it saved fal.ai 85% on their object storage costs as compared to other clouds with egress fees. With cost no longer limiting their object storage, fal.ai unlocked limitless horizontal scale.**

**Tags:**

* [Customers](/blog/tags/customers/.md)
* [object storage](/blog/tags/object-storage/.md)
* [s3](/blog/tags/s-3/.md)
* [ai](/blog/tags/ai/.md)
* [case studies](/blog/tags/case-studies/.md)
* [case study](/blog/tags/case-study/.md)