Skip to main content

It's a Trap! The Two Generals' Problem

In distributed systems, coordination is hard—really hard—especially when both parties depend on mutual confirmation to proceed, but there’s no guarantee their messages will arrive. This classic dilemma is known as the Two Generals’ Problem. Like most problems in computer science, it’s easier to understand when explained with lasers, spaceships, and sarcastic smugglers.

Let’s set the stage: It’s Return of the Jedi and the second Death Star looms large over the forest moon of Endor. The Rebel Alliance’s plan hinges on a synchronized attack — Han Solo leads the ground team to destroy the shield generator, while Lando Calrissian leads the space fleet to attack the Death Star itself.

For the mission to succeed, both parties must execute their part of the plan. If either side decides to abort, the other must as well, or it will be a disaster.

And here’s the twist: Han and Lando can only communicate through spotty, insecure rebel comms. Sound familiar?

🔗The Rebel messaging system

Let’s imagine Han and Lando are nodes in a distributed system. They need to coordinate a commit to the plan:

  • Han says, “I’ll disable the shield.”
  • Lando says, “I’ll attack once the shield is down.”

But Lando can’t attack until he knows for sure that Han will take out the shield. Han doesn’t want to risk the mission unless he knows Lando is ready to strike at just the right moment.

So Han sends a message:

“I’m ready to blow up the shield generator at 0300. Are you ready?”

Lando replies:

“Acknowledged. I’ll attack at 0300.”

But… what if Han never receives Lando’s reply?

Maybe the Empire is jamming the signal — those Stormtroopers aren’t known for their aim, but their comms interference is top-tier. Han now faces a dilemma:

  • Proceed, risking that Lando never got the message.
  • Wait, risking that Lando attacks without backup — or worse, aborts the mission.

Now Han tries to send another message:

“Got your confirmation—just confirming again we’re still go at 0300?”

And Lando has the same problem:

“Did he get my reply? Did he get my confirmation of his confirmation?”

Welcome to the infinite confirmation loop of the Two Generals’ Problem.

🔗No reliable victory without a reliable channel

The Two Generals’ Problem highlights a core truth in distributed systems: coordination over unreliable communication is fundamentally flawed. Even if messages arrive most of the time, we can’t be certain without an acknowledgment. And even then, we can’t be sure that the acknowledgment itself arrived.

Back in the day, we tried to solve this with distributed transactions, where technologies that use a two-phase commit algorithm like the Distributed Transaction Coordinator (DTC) would attempt to coordinate between databases and message queues (say, SQL Server and MSMQ) to ensure both the data and the message were committed atomically. The idea was noble: all or nothing across systems.

In practice, though? Depending on DTC was like relying on a Stormtrooper to hit a target directly in front of their helmet. Thankfully, we’ve moved past distributed transactions in modern architectures. But that doesn’t mean we can ignore the underlying problem. If anything, it means we have to solve it more thoughtfully.

Because the Two Generals’ Problem is hard, in distributed systems, we don’t try to solve the unsolvable. Instead, we change the game.

Avoid requiring perfect coordination. Don’t make success depend on both sides committing perfectly and deal with any mistakes that arise because of it. You can decide if this sounds like a plan and how to deal with it.

Design for uncertainty. Han’s team is intercepted, and when Lando goes to assault the Death Star, he realizes that the shield is still up. But importantly, neither side abandons the mission. They both keep retrying until they succeed.

We won’t get another chance at this, Admiral. Han will have that shield down. We’ve got to give him more time!

Use reliable message delivery. NServiceBus includes safeguards to ensure your messages don’t vanish into hyperspace.

Leverage the Outbox pattern. The Rebel Alliance makes the plan while everyone’s in the same room—synchronously agreeing on a coordinated two-pronged attack. In NServiceBus, the Outbox pattern ensures that either both missions are executed and succeed together, or neither is. Once the operation is in motion, there’s no turning back. Even when comms are jammed, and blaster fire erupts, both Han and Lando stick to the plan. The two generals are deployed in sync—after that, reliable communication is no longer assumed, but consistency is guaranteed.

🔗Trust the Force (or better: the Outbox)

In the real world, distributed systems don’t involve Death Stars or space smugglers (unfortunately). However, they do involve services trying to coordinate actions in the face of unreliable networks.

You can’t rely on perfect communication. But you can design systems that don’t break when communication is imperfect.

So the next time you’re designing a distributed system and thinking, “How can I make sure both sides agree before acting?” remember: Lando aborted and tried on a feeling. Don’t bet your business on a feeling, but use the reliability in NServiceBus.

If you feel like talking to one of our experienced distributed systems Jedi might help, transmit a distress call on the HoloNet. We’ll help you come up with a plan that ensures no Bothans will come to any harm.

Design for failure. Use the Outbox. Never, ever bet the galaxy on a single ACK. And may the 4th be with you.

Share on Twitter

About the author

Dennis van der Stelt

Dennis van der Stelt is a developer at Particular Software who does messaging in less than 12 parsecs, without cheating the route.

Don't miss a thing. Sign up today and we'll send you an email when new posts come out.
Thank you for subscribing. We'll be in touch soon.
 
We collect and use this information in accordance with our privacy policy.