Karan Goel, a graduate of IIT Delhi and Stanford, is the founder and CEO of Cartesia, an AI voice technology company based in Silicon Valley. Goel’s startup raised $100 million from top venture funds—Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA—positioning Cartesia as a key innovator in voice AI.
Who is Karan Goel? IIT Delhi Alumnus Profile

Karan Goel studied at Delhi Public School, IIT Delhi (Electrical Engineering), Carnegie Mellon, and Stanford, earning top honors including the Siebel Scholarship. At Stanford’s AI Lab, Goel and his co-founder Albert Gu advanced "State Space Models" (SSMs) and launched Cartesia to bring real-time, natural voice AI to businesses worldwide
What is Sonic-3? The Fastest Natural AI Voice Model
Sonic-3 is Cartesia’s real-time text-to-speech (TTS) AI model designed to make conversations sound human-like. It can generate laughter and express a full range of emotions during live conversations. Sonic-3 supports 42 languages and achieves a lightning-fast end-to-end latency of just 190 milliseconds, making it one of the fastest voice AI models on the market.
How Sonic-3 Works: State Space Models (SSMs) Explained
Unlike most voice AI tools that use Transformers, Sonic-3 is built on State Space Models (SSMs). Transformers reprocess the entire conversation to generate each new word, causing delays. SSMs, pioneered by Karan and his co-founder at Stanford AI Lab, work like humans by remembering the topic and tone, enabling Sonic-3 to respond in real-time more naturally and efficiently.
Sonic-3 vs Other AI Voice Tools Like Eleven Labs
| Feature | Sonic-3 (Cartesia) | Eleven Labs |
|---|---|---|
| Latency | 90ms model latency, 190ms end-to-end latency | 75ms (lower quality) to 300ms+ |
| Voice Quality | Natural, expressive, emotional range including laughter | Less natural, fewer emotions |
| Audio Required for Cloning | 3 seconds for instant voice clones | 10-30 seconds minimum audio |
| Language Support | 42 languages | 32 languages |
| Model Architecture | State Space Models (SSMs) | Transformer-based |
| Deployment | Supports on-prem and on-device | No on-prem or device support |
| Voice Customization | Speed and emotion controls, synthetic voice mixing | Stability, style exaggeration controls |
| User Preference | Preferred by 62% in blind human tests | Preferred by 38.6% in tests |
Sonic-3 stands out with faster speeds, richer emotional range, and more flexible deployment options compared to competitors like Eleven Labs.
Karan Goel’s Viral Tweet: $100M Funding + Sonic-3 Launch
Karan Goel announced the $100 million funding and Sonic-3 launch in a viral tweet that sparked thousands of likes and replies. In the tweet, he detailed Sonic-3’s unique SSM approach and offered a $5,000 charity pledge if Cartesia cannot improve qualified users’ voice AI.
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA.
— Karan Goel (@krandiash) October 28, 2025
Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation.
What makes Sonic-3 great:
- Breakthrough naturalness - laughter and full emotional range
- Lightning fast -… pic.twitter.com/EGwdxMnd1X
Karan Goyal's post rapidly get viral and hundreds of techies and founders have comment on his post:
- Future Stacked: "190ms end-to-end is seriously impressive. The breakthrough on emotional range is what really caught our attention, that’s been the missing piece for natural conversation."
- Kevin Garber: "Would be curious to use this for @LogicGlue_ - we are particularly interested in latency improvements over OpenAI."
Cartesia’s funding and innovation puts it ahead among global AI startups as top investors bet on new technology for voice and conversation. SSMs may become the new standard, replacing older transformer-based models that are slower and less natural.