Why Goroutines Beat Java Threads at Scale
The Problem in Plain Terms
In Java, spinning up too many threads eventually gets you:
java.lang.OutOfMemoryError: unable to create native thread
On a typical machine this often happens after roughly 10–15 thousand threads. In Go, you can run millions of goroutines without breaking a sweat. The gap comes from how each runtime models concurrency.
What a Thread Actually Needs
A unit of concurrency needs at least:
- An instruction pointer — where execution was when it was paused.
- A stack — local state (variables, return addresses). Heaps are usually shared.
With those two, a scheduler can pause one unit, run another, then resume the first. The difference between Go and Java is how they provide that, and at what cost.
Why the JVM Hits a Wall
1. One thread = one OS thread
Mainstream JVMs map each Java thread to a kernel (OS) thread. So you’re limited by how many OS threads the system can create and how expensive they are.
2. Fixed, large stacks
Each OS thread gets its own stack. On 64-bit JVMs the default is often 1 MB per thread. So 10k threads ≈ 10 GB of stack memory before they do much. Shrinking the stack saves memory but increases stack-overflow risk in recursive or deep call chains.
3. Expensive context switches
The kernel schedules these threads. Switching from one thread to another involves kernel work (saving/restoring state, switching address space, etc.). That’s on the order of microseconds per switch. If you want every thread to get a slice of CPU at least once per second, the math caps you in the tens of thousands of threads per core before you’ve left any time for real work.
So: fixed big stacks and kernel-level scheduling put a hard ceiling on how many “threads” you can have in a JVM process.
How Go Does It Differently
Lightweight stacks
Goroutines start with a small stack (on the order of a few KB) that grows (and can shrink) as needed. So you can fit hundreds of thousands of goroutines in a gigabyte where the same RAM might only hold thousands of Java threads with default stack size.
User-space scheduler (M:N model)
Go doesn’t give each goroutine its own OS thread. The Go runtime runs many goroutines on fewer OS threads. The runtime scheduler:
- Decides which goroutine runs on which thread.
- Can avoid running a goroutine until it has work to do (e.g. when it’s no longer blocked on a channel).
So you get:
- Fewer OS threads → less kernel context switching.
- Smarter scheduling → only run goroutines that are ready, instead of constantly switching between everyone.
That’s why you can have millions of mostly-idle goroutines (e.g. waiting on I/O or channels) without melting the machine.
What This Means in Practice
-
Java (and other 1:1 thread models): Great for CPU-bound work and when thread count stays in the low thousands. Thread leaks or “one thread per request” designs can quickly hit
OutOfMemoryError: unable to create native threadin production. -
Go: Suited for high concurrency (many connections, many tasks, I/O-bound workloads) because goroutines are cheap and the runtime multiplexes them onto a small number of OS threads.
Similar ideas show up elsewhere: Nginx multiplexes many connections per OS thread; Erlang’s processes are lightweight; runtimes like Akka bring “many logical actors, fewer OS threads” to the JVM. Go’s goroutines are one well-known way to get that kind of scalability with a simple go keyword.
Bottom Line
Goroutines aren’t “better” in every sense—they’re a different tradeoff. Go pays with runtime complexity (scheduler, growable stacks) to make concurrency cheap and scalable. The JVM keeps a simpler, OS-thread-based model and pays in per-thread memory and kernel scheduling. For high-concurrency servers and services, the Go model often wins; for other workloads, Java’s threading model is still a good fit. Picking the right tool depends on how many concurrent units you need and what they’re doing.