- World Labs raised $1 billion in February 2026 from NVIDIA, AMD, Autodesk, and Fidelity — pushing its valuation toward $5 billion.
- Fei-Fei Li created ImageNet, the 15-million-image dataset that ignited the deep learning revolution in 2012.
- She was named one of TIME’s 2025 Persons of the Year as an “Architect of AI” alongside Jensen Huang and Sam Altman.
- Her memoir “The Worlds I See” was selected as one of Barack Obama’s recommended books on AI.
A $5 Billion Bet That AI Should Understand the Physical World
World Labs is building what most AI companies ignore: spatial intelligence. Founded in 2024, the startup emerged from stealth with $230 million and a thesis that artificial intelligence will remain fundamentally limited until it can perceive, reason about, and interact with three-dimensional space. Fourteen months later, it raised another $1 billion at a reported $5 billion valuation. Investors include NVIDIA, AMD, Autodesk, Fidelity, and Emerson Collective.
At the helm is Fei-Fei Li — Stanford professor, co-director of the Human-Centered AI Institute, and the researcher whose work on computer vision made modern AI possible. Her path to this point started 8,000 miles away, in a country she left as a teenager with almost nothing.
A Childhood in Chengdu, Defined by Books and Curiosity
Fei-Fei Li was born in 1976 in Beijing and grew up in Chengdu, the capital of Sichuan province. Her parents were intellectuals — her father a mechanical engineer, her mother a chemist — but opportunities in 1980s China were limited. Li devoured books: science fiction, European literature, anything she could find. Physics fascinated her. The natural world demanded explanations, and she wanted to provide them.
In 1992, at sixteen, the family boarded a plane to New Jersey. They arrived in Parsippany with less than $20. Li spoke almost no English. Her parents, whose degrees meant nothing in the American job market, opened a dry-cleaning shop. Li worked alongside them — and in a Chinese restaurant on weekends — while attending public school.
”My entire career is going after problems that are just so hard, bordering on delusional.”
The grind paid off in the most improbable way. Li earned a full scholarship to Princeton. She asked two different high school advisors to verify the acceptance letter.
Princeton, Caltech, and a Question No One Else Was Asking
At Princeton, Li studied physics. The rigor suited her, but the field felt too narrow. Something larger was pulling her — the intersection of how machines could learn to see the way humans do. She moved to Caltech for her PhD, shifting from physics to computer vision and computational neuroscience. The question she fixated on was deceptively simple: how do you teach a machine to recognize a cat, a chair, a face.
The AI community at the time was small. Funding was scarce. Neural networks were considered a dead end by most computer scientists. Li didn’t care. She spent years assembling what would become the most consequential dataset in the history of artificial intelligence.
15 Million Images That Changed Everything
ImageNet launched in 2009: 15 million images across 22,000 categories, each labeled by hand using Amazon Mechanical Turk. It was the largest visual dataset ever created, and almost nobody noticed. Li organized the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC), an annual competition that invited researchers to build models that could classify images accurately.
In 2012, a team from the University of Toronto — led by Geoffrey Hinton — entered a deep neural network called AlexNet. It crushed every competitor by a margin the field had never seen. The result proved that deep learning worked at scale, but only because ImageNet provided the data to train it. Without Li’s dataset, there would have been no AlexNet, no deep learning explosion, no modern AI industry as it exists today.
She joined Stanford as an assistant professor in 2009, earned tenure by 2012, and became full professor in 2018. She co-founded the Stanford Institute for Human-Centered Artificial Intelligence (HAI), insisting that AI development needed ethics baked in from the start — not bolted on after deployment.
Google Cloud, a Brief Corporate Detour, and Back to the Lab
In 2017, Google recruited Li as Vice President and Chief Scientist of AI and Machine Learning at Google Cloud. Her mandate was democratization: lower the barrier for businesses to use AI through products like AutoML. The role gave her a front-row seat to how AI was being commercialized — and what was being overlooked.
She returned to Stanford in 2018. The corporate world had confirmed something she already suspected. Language models and 2D image generators were dominating research budgets, but they were missing a dimension. Literally. AI could generate a photorealistic image of a living room but couldn’t understand that a couch has depth, a window lets in light, and a person can walk through the door.
”Our dreams of truly intelligent machines will not be complete without spatial intelligence. This quest is my North Star.”
World Labs and the Race to Build Spatial Intelligence
In September 2024, Li founded World Labs with a team of Stanford and Google DeepMind veterans. The company launched Marble, its first commercial product, in November 2025 — a generative world model that creates navigable, persistent 3D environments from text, images, or video prompts. Unlike AI video generators that produce flat, morphing sequences, Marble generates stable three-dimensional worlds that users can explore and edit.
Early adopters include creative studios, game developers, and VFX teams replacing costly manual modeling workflows. Autodesk invested $200 million in the February 2026 round and signed on as an adviser. The product offers tiered pricing from free to $95 per month for commercial use. In December 2025, TIME named Li one of its Persons of the Year — the only academic researcher on a list dominated by CEOs.
What Comes After Seeing: Teaching AI to Understand the World
Li’s ambition for World Labs goes beyond 3D scene generation. Spatial intelligence, as she defines it, is the ability for AI to understand how the physical world works — to reason about gravity, occlusion, material properties, and human interaction with objects. It’s the cognitive layer that separates a model that can describe a kitchen from one that can navigate it.
”The candid truth is that AI’s spatial capabilities remain far from the human level. But tremendous progress has indeed been made.”
The company is now working on what it calls world models — AI systems that don’t just generate images of reality but simulate how reality behaves. Applications span robotics, autonomous vehicles, architecture, urban planning, and augmented reality. With $1.23 billion raised, partnerships with NVIDIA and Autodesk, and the woman who made deep learning possible at the controls, World Labs is positioned as the leading spatial intelligence AI startup in 2026. The question is no longer whether machines can see. It’s whether they can understand what they’re looking at.