Lin Qiao: From PyTorch to a $4 Billion Inference Empire

Fireworks AI processes over 10 trillion tokens per day and powers more than 10,000 companies including Samsung, Uber, DoorDash, Shopify, and Notion.
Lin Qiao led the PyTorch team at Meta for seven years, managing 300+ engineers and scaling AI infrastructure to five trillion daily inference operations.
Fireworks AI raised $327 million in total funding, reaching a $4 billion valuation after its $250 million Series C in October 2025 — a sevenfold increase from its $552 million Series B valuation.
The company acquired Hathora in March 2026 to strengthen its global compute orchestration layer for real-time AI inference.

10 Trillion Tokens a Day and the Woman Who Made It Possible

Fireworks AI is not a household name. It sits in the plumbing layer of the AI stack — the infrastructure that makes generative AI actually work in production. The platform processes more than 10 trillion tokens per day, serves over 10,000 enterprise customers, and has crossed $280 million in annualized revenue. When developers at Uber need real-time inference or engineers at Shopify fine-tune models for their specific workflows, they reach for Fireworks. The company’s rapid ascent mirrors a broader trend: AI infrastructure companies are hitting billion-dollar revenue milestones faster than any previous generation of enterprise software.

The company is barely three years old. Its CEO, Lin Qiao, spent the better part of a decade building the AI framework that most of the industry depends on. Then she left to build the infrastructure layer she believed was missing.

A Computer Science Student From Fudan to UC Santa Barbara

Lin Qiao grew up in China and studied computer science at Fudan University in Shanghai, one of the country’s most prestigious institutions. She earned both a bachelor’s and a master’s degree there before moving to the United States for doctoral work.

At the University of California, Santa Barbara, she completed a Ph.D. in computer science in 2005, specializing in distributed systems and database management. The work was deeply technical — the kind of research that would later inform her approach to building AI infrastructure that operates at planetary scale.

Seven Years Inside Meta’s AI Machine

Qiao’s early career included stints at IBM as a research staff member and at LinkedIn as a tech lead and staff software engineer, where she worked on distributed data serving. But the defining chapter began in July 2015, when she joined Meta as a Senior Director of Engineering.

Over seven years, she led more than 300 engineers building AI frameworks and platforms — first Caffe2, then PyTorch. What started as a six-month project to unify Meta’s AI workload turned into a five-year rebuild of the entire stack: how to load data efficiently, run distributed inference, and scale training across Facebook’s global data centers and billions of devices. By the time she left, the system sustained more than five trillion inference operations per day.

”It was a five-year project to support Meta’s entire AI workload building on top of PyTorch, requiring rebuilding the whole stack from scratch.” — Lin Qiao

PyTorch became the most widely used open-source AI framework in the world. Qiao had built the engine. Now she wanted to build the road.

The Five-Year Problem That Became a Five-Week Mission

The frustration was specific. Qiao had watched Meta spend years getting AI into production at scale. She knew most companies could not afford that timeline. The gap between training a model and deploying it in a real application was enormous — and nobody was solving it well.

In October 2022, she left Meta and co-founded Fireworks AI with a team of former Meta and Google engineers, including Benny Chen (Meta’s ads infrastructure lead), Chenyu Zhao (Google’s Vertex AI lead), and Dmytro Dzhulgakov (a core PyTorch maintainer). The mission was to compress what took Meta five years into five weeks — or even five days.

”Startups aren’t about incremental changes. You have to deliver a ten-times improvement — otherwise, why would anyone switch?” — Lin Qiao

The founding team understood enterprise AI infrastructure at a level few startups could match. They had built it at the largest scale in the world. Now they were rebuilding it for everyone else.

The Fastest Inference Engine in the Industry

Fireworks launched as a SaaS platform for AI inference and model fine-tuning, designed to let enterprises deploy open-source and custom models without managing their own GPU clusters. The platform reached 175 tokens per second — the highest speed in the industry at the time — and built automated systems that tune performance for each customer’s specific workload.

The compound AI approach set Fireworks apart. Rather than betting on a single model to solve every problem, the platform lets developers combine multiple models, retrievers, and tools into production-ready systems. In 2024, Fireworks released f1, its own compound AI model that interweaves multiple open-source models at the inference layer for complex reasoning tasks.

Real-world results followed. A food chain company scaled its AI application from one location to a thousand in three months. A software development company expanded an AI feature from 100,000 developers to 25 million developers in the same timeframe — all through Fireworks’ inference optimization.

From $552 Million to $4 Billion in Twelve Months

The funding trajectory tells the story. Sequoia Capital led the $52 million Series B in July 2024, joined by NVIDIA, AMD, MongoDB Ventures, and Databricks Ventures. That round valued Fireworks at $552 million. Fifteen months later, in October 2025, a $250 million Series C co-led by Lightspeed Venture Partners, Index Ventures, and Evantic pushed the valuation to $4 billion — a sevenfold increase.

Total funding now exceeds $327 million. The customer roster reads like a Fortune 500 shortlist: Samsung, Uber, DoorDash, Notion, Shopify, Upwork. The platform serves hundreds of thousands of developers. Revenue has crossed $280 million on an annualized basis, up from what was a fraction of that at Series B.

Unlocking 90% of the World’s Intelligence

Qiao’s thesis for the next phase is characteristically direct: 90% of the world’s data is private, and no foundation model has ever seen it. The opportunity is in helping enterprises build AI applications on top of their own data — not just serving generic models faster.

”The future is not one-size-fits-all. It’s heavy customization for specific applications. Open-source models are going to be better than proprietary — DeepSeek is just one instance.” — Lin Qiao

In March 2026, Fireworks acquired Hathora, a real-time compute orchestration platform spanning 14 regions and four clouds, to build out its global infrastructure for low-latency AI inference. Qiao is also speaking at NVIDIA GTC 2026 on AI as essential infrastructure — a signal that Fireworks is positioning itself not as a startup riding the AI wave, but as the infrastructure layer the wave runs on.

From a Ph.D. in distributed systems to rebuilding PyTorch from scratch to scaling a $4 billion AI inference company in three years — Lin Qiao has spent two decades solving the same problem at increasing scale. The bet now is that every enterprise in the world will need what she is building. Given the trajectory, it is hard to argue otherwise.

Fireworks AI | Lin Qiao on LinkedIn