Has Microsoft really changed?
People have a lot of opinions about the “new” Microsoft under CEO Satya Nadella. They’ve embraced open-source, including .NET Core. They declared Microsoft ❤ Linux. They acquired GitHub. It’s been a wild ride for those of us used to the closed, dare I say grumpy Microsoft of the past.
But are things different today? When the rubber hits the road, is Microsoft really more open, more accessible, more helpful?
When we were building the Azure Service Bus transport for .NET Core we got a chance to find out.
We want to make it as easy as possible for our customers to upgrade their systems with zero downtime. It’s important they be able to update one endpoint at a time. Turning off an entire system to run an irreversible conversion script while hoping everything turns out OK is not an option.
Making zero-downtime deployment work for our new Azure Service Bus transport for .NET Core proved to be a bit of a challenge.
The old transport still had two completely different ways to organize Azure topics and queues, called topologies. The forwarding topology is best for new projects, but we still had customers using the older endpoint-oriented topology. That’s from the early days of Azure and isn’t compatible when it comes to how events are distributed to subscribers.
Migrating is pretty straightforward on the forwarding topology. We wanted to provide a path forward for customers on the endpoint-oriented topology as well.
So, we decided to release one last version of the old transport that would include a migration feature. This would allow people to upgrade each endpoint, one at a time, to the migration mode version. When complete, you would then upgrade to the new topology, again one endpoint at a time.
Afterward, you’d be on the forwarding topology, and could easily upgrade to the .NET Core transport.
It was a great plan. Too bad it didn’t work.
Hop to it
During testing, we discovered the migration strategy had a fatal flaw.
The migration feature depended on an Azure Service Bus feature called auto-forwarding. It only allows three hops to protect against infinite or circular forwarding, or your message is dead-lettered. But three hops was all we needed.
However, when using the SendVia feature, which is used by NServiceBus to implement the
SendsAtomicWithReceive transaction mode, the number of hops is affected even though it isn’t a user-driven hop.
We were only using three hops, but the broker counted the use of SendVia as a fourth hop. As a result, it dead-lettered all messages when we tested using that transaction mode.
Working as designed, won’t fix?
We had a serious hopping problem, and we needed Microsoft’s help to fix it. We contacted the Azure Service Bus team and told them about our plight. It’s easy to imagine that the old Microsoft would have said “Sorry, working as designed. Won’t fix.”
Luckily, that’s not what happened.
Microsoft agreed that system-driven hops should not count against the forwarding limit. They decided to roll out a change to Azure Service Bus to differentiate user-driven and system-driven hops. This would allow our migration feature the number of user-driven hops required to move messages around.
At first, Microsoft said it could take months. As a stopgap, they could enable it only for customers who specifically requested it. But before we got the chance to notify customers, they rolled it out globally. Our customers wouldn’t need to worry about contacting Microsoft, and we wouldn’t have to create any runtime checks to verify the customer’s environment.
The Azure Service Bus team came through for us and as a result, we were able to ship the migration feature.
But that’s not the end of the story. We had customers with Go-Live licenses testing in production without issue for months. Suddenly, we started getting bug reports that we traced back to Microsoft’s Azure Service Bus client library.
Not a problem. The Azure Service Bus client library is open source on GitHub, so we submitted a pull request to fix it. Even though we were close to the holidays, the pull request was merged the next day, and we had a new release the day after that.
So we released the new transport.
In the days of the old, closed Microsoft, things wouldn’t have happened this way. If Microsoft had even agreed there was a bug at all, it would have been months (years?) before we had any sort of resolution.
We planned workarounds, just in case. Most involved a bunch of unnecessary pain for our customers. As it turned out, we didn’t need them.
So has Microsoft really changed? We think it’s fair to say that yes, they have.