Skip to main content

Change is inevitable: Versioning event-driven systems

About this video

This session was presented at JetBrains .NET Days Online.

Building an event-driven system is anything but trivial. However, once you make it past the sea of pub-sub vs. command-response debates and the service boundaries conundrum, you’ll soon face the inevitable: change. The conversations that follow sound all too familiar… “Who’s subscribed to this message?” “Do other services depend on this field in the payload?” “Why on earth is that thing in the payload?” “That service should never rely on this data!” And, of course, the obvious “Can’t we -just- remove this?”

But are those the right questions to ask? As software developers, we aim to be agents of change, not chaos. To achieve this, we need to understand the impact of tweaking a message contract without breaking half of the system or forcing other teams beyond their deadlines. We should prioritize techniques that ensure compatibility while also considering how long that compatibility needs to be sustained. Oh, and let’s not forget that we’re supposed to solve this problem with zero downtime, as our users are spread across every time zone. In this session, we’ll discuss practical techniques and tooling that can enable the evolution of your event-driven system so that, next time a stakeholder approaches with a change request, your heart doesn’t sink to the floor.

🔗Transcription

00:00:05 Laila Bougria
Hello, everyone. My name is Laila. I'm a software engineer and solutions architect at Particular Software. These are all the ways that you can find me online.
00:00:12 Laila Bougria
And basically, I've been building software now for two decades. Yeah, it's been a while. And over the last decade, or even a little bit more than that, I've been focusing on distributed systems. And I can tell you one thing, that building distributed systems is incredibly hard, okay? I mean, it's a lot of fun, really, it's a lot of fun, but it's also very, very difficult, because let's face it, we are facing a bunch of non-trivial issues and problems that we basically have to solve, right?
00:00:43 Laila Bougria
And over the years, when I sort of look back, I can see that we had so many lengthy discussions to understand the domain and get a sense on how are we going to build the systems. Like, how many services do we require? And what would be the right service boundaries? And what would be the right communication style to use? Are we going to use synchronous communication or asynchronous communication? Or in which context are we going to use each individual communication style? And should we use commands or events in this specific scenario? And what about the workflow? How does that look like? Is it complex enough that we need some kind of coordination style? Are we going to use orchestration or choreography?
00:01:27 Laila Bougria
And if you were here last year, maybe you saw one of those talks that I did around this topic. But these were so many incredibly, sometimes very heated discussions. And it sometimes took weeks and months of going back and forth and understanding the domain better and progressive insights until we basically landed where we felt it was comfortable, where we thought we were doing the best possible things with the knowledge that we had.
00:01:52 Laila Bougria
But when I look back, I also kind of see that there was always a blind spot, something that was often, if not most of the time, overlooked in those conversations, and that is change. Because as our systems grow and evolve, then change will come our way. And yes, sometimes it can be incredibly frustrating because we spend so much effort into getting everything right, but actually I like to think of change as a good thing. It means that the system is evolving. It means that it's being successful. And therefore, new things are being added. Things are being changed. And we are there to implement those. And that's something that we just have to deal with. It means that we have more work to do.
00:02:33 Laila Bougria
But change is also something that we need to start thinking about very early days so that we can successfully prepare our systems in terms of evolving that event-driven system. And that starts by appropriately designing the events that we emit, the messages that we are sending around as part of our event-driven system.
00:02:55 Laila Bougria
So, let's consider an example. Let's say that we have a credit card, like, the simplest thing you can think of, okay? Let's not overcomplicate things. You basically have some kind of a Mastercard, something like that. You have a balance of, I don't know, 1,000, 5,000 euros, dollars, whatever it is, that you can spend within a single period. And at the end of that period, whatever you spend is immediately deducted from your linked debit account. So, we're not going over periods and stuff like that. We're just keeping it simple. And we have this credit card transaction service that is basically responsible for handling all of those card transactions or card operations.
00:03:34 Laila Bougria
So, when you think about what can happen on a card like that, then you could argue, "Well, there are only really two types of operations, right? Either you have a debit operation and money goes off the card, or you have a credit operation and then money comes back. That's it." And also, the attributes are the same. So we basically landed on a single event type, CreditCardTransactionMade, with a bunch of attributes. And what we did is we introduced this operationType so that we can distinguish these debit from credit operations.
00:04:06 Laila Bougria
And this may appear like a good solution at first, but it starts to break apart very quickly once change comes our way. Because let's say that we got this new requirement saying, "Well, you know what? We also want the ability to reserve funds." So let's say that you check into a hotel and they're like, "Well, we're going to block," I don't know, "300 euros on your credit card, just in case you decide to empty your fridge and then run off to the Bahamas. We'd just like to make sure you don't do that without paying your bill. So we're going to block a certain amount on your credit card, but not charge you." So we're like, "Okay, well, fair enough. We can just introduce this new operationType that is for reserving funds. And we can add the reservedUntil attribute so we can indicate until when that money is blocked."
00:04:56 Laila Bougria
But the impact on the consumers of this event is massive because the consumers that don't really care about this operationType now need to start filtering this out, because they won't be able to recognize it and you won't want them to fail if they find something there that they don't recognize. But if they do care, it's also not ideal because what you've now created is what is called functional coupling. Because now your consumers need to understand, "Oh, if the operationType is a reserveFunds type of operation, it doesn't actually affect the amount of money you spend. It is only affecting the available balance that you have," right? And that means that they now have some coupling into how your internal domain works. They also need to understand that that reservedUntil field is only really there when the operationType is of type reserveFunds. So, this isn't really great.
00:05:55 Laila Bougria
But things could get even worse. Because our bank said, "Well, we don't really like it when people use their credit cards to withdraw money outside, from the wall," like we tend to say in Belgium. I don't know if that translates to English. But you get the point, right? So, how does a bank discourage it? Well, they're going to charge you money for it. That's what they do, right? So, we thought about this and we're like, "Well, you know what? A withdrawal is still a debit transaction, but we do want to differentiate the cost from the actual money that you got," right? I maybe withdrew 500 euros and it cost me 5 euros to do that. I want to know the difference. So, what we did is we added this field called cost. We made it required. And we just always set the value to 0 if there was no cost involved.
00:06:44 Laila Bougria
But then it's like, "Okay, but what happens if you withdraw money and you're in a foreign country?" I mean, we do have this currency as part of our payload for the money that was spent or withdrawn or whatever. But what about the cost? Is that then expressed in the issuing bank's currency or the currency of the country you were in or the currency of your account? Oh, the easy fix is just to add another field, cost currency, and then we keep on polluting this event. Or we can also do something worse and actually tell our consumers, "You know what? You can just assume that the cost will always be in the currency of the account," which, again, is leading to functional coupling. You are forcing your consumers to understand that this is the way that the cost works. And it's something that doesn't belong to their context. And worse, it's something that could change on our end and then needs to also be reflected in the consuming code.
00:07:48 Laila Bougria
So, when we looked at this, we were like, "Yeah, we don't really like this." And we re-evaluated the entire payload and we were like, "Why is that cost even there? You know what we should better do? I think we should be emitting two events. We should emit this operationType debit for the amount that was actually incurred, and then another event of the same type with an operationType of CostIncurred that represents that 5 euros. And then we also have that currency attribute that we could use. So, what's the big deal? Let's just do it. Everything is going to be great."
00:08:22 Laila Bougria
But the question is, how does something like this affect your consumers? And the answer is easy. As you can already see on your screen, this basically turns into what we call a poison message. Because when you emit a message or an event that doesn't contain the attributes that your consumers expect and rely on, and basically they can process that no matter how many retries they do, that message will end up in an error queue, in a dead-letter queue, without an ability to basically consume that in any way without going back to the code, making changes, redeploying those services. And then we are able to consume that. If you have many consumers in a larger system landscape, this is incredibly painful, right?
00:09:12 Laila Bougria
And also, from a producer perspective, I've seen this happen all the time. "It's fine. You can just upgrade to this new event type and everything will be fine." And your consumers are like, "What do you mean it's going to be fine? Have you actually thought this through? Because I don't know if you realize, but I have a bunch of in-flight messages, basically messages that are sitting in my queue, and I haven't had the ability to consume those yet. And now you're throwing this new types of events at me, and I don't know how to deal with both."
00:09:49 Laila Bougria
Or what if we actually scheduled delayed messages? This is something that you can do with plenty of message brokers out there, like Azure Service Bus. And some of those have been scheduled under this old event format. How am I supposed to deal with that? I don't know when those events are going to come through.
00:10:07 Laila Bougria
Or what about the messages that I might have in an error queue? Sometimes your consumers might have their own bugs that are basically stopping them from consuming some of those messages. And while they work on it and they deploy a fix, they will want to retry those messages to bring their service or their system up to date. But if we're then changing that event type, we're not making it easy on them.
00:10:32 Laila Bougria
But finally, a very important part is, "Hey, I have my own deadlines. You're just imposing this change on me. And what now? Why are your deadlines more important than mine?" And really, what it comes down to is that when we make these types of changes, we are taking away autonomy. Either we're taking it away on the consumer side, because basically we're imposing the change on them, or we're taking it away on the producing side. And we basically can't move our system forward because we are not allowed to break our consumers.
00:11:04 Laila Bougria
But there's a better solution for this. And that starts with a better design. Now, one of the things that I always say is that a good way to start making changes to avoid this is to design more granular and more business-meaningful events. I mean, I've been talking about a debit and a credit operation as if they make total sense. But if you think about it, within a system, outside of a specific service boundary or bounded context, what do those terms mean, debit, credit? Might mean nothing. It might actually have a completely different meaning than what we understand within the scope of this credit card transaction service, right?
00:11:45 Laila Bougria
So, it's important that we look at this, "What has really happened?" from the outside perspective. And when you think about it that way, well, a payment was made with this credit card, or a refund was issued on this credit card, or maybe there were funds blocked. Maybe there were costs incurred, right? And this is something that makes sense, and that is something that you can communicate to the rest of the system landscape, and it is meaningful to everyone, right? It also allows you to carry only the relevant attributes that make sense for that specific event that has happened. And you don't have this event type that can mean different things anymore.
00:12:29 Laila Bougria
So, it's important that we design events that have a single meaning. You have a single event type for a single business-meaningful event. And it's really important to think about it from the business perspective of what has happened in the system in a way that a product owner can communicate about, in a way that a product owner would understand, not in engineering or development terms. That's usually a pitfall, status change and things like that.
00:13:01 Laila Bougria
The meaning of the event type also must remain stable over time. And when you are talking in terms of business-meaningful events, that tends to be easier to maintain as well.
00:13:12 Laila Bougria
This also allows our event types to evolve independently of each other, because if something changes in how we incur costs, well, then only that specific event type would be impacted and not all of the other ones.
00:13:25 Laila Bougria
It also gives you flexibility in the sense that your consumers only have to subscribe to the event types that they actually care about without having to filter the stuff out that they don't really care about. You also get a situation in where you're not polluting the contract with conditional attributes, attributes that are sometimes there and sometimes not. And that also means that your consumers can reduce their functional coupling. They don't need to understand when is this around and when is this not around. And they can also keep their own code simple and focused.
00:14:04 Laila Bougria
But of course, when we start to split things apart very granularly, sometimes that can also be hurtful. So there's another side to this medal, right? And it is that sometimes our consumers don't really need this level of granularity. They just want an overview of what happened. So, a common solution to this problem is that consumers will start subscribing to many granular events, aggregate that information to generate this overview.
00:14:36 Laila Bougria
So, let's look at an example, right? Let's say that for this credit card example, I just want to know at the end of a period, how much money was spent, because I need to go and deduct that from the linked debit account in another service, right? So I just want that total amount that was spent. Well, in order to do that, I need to subscribe to the PaymentMade event, to the RefundIssued event, and to the CostIncurred event. Because remember that FundsBlocked does not affect the amount of money you actually spent. But this is tricky because when new events are added that affect that amount spent, well, you need to be aware of that. You need to become aware of that so that you can add a subscription or maybe remove a subscription, again, causing that functional coupling.
00:15:21 Laila Bougria
What is also tricky is that as a consumer, now I need to understand what a period is and how long that period runs, which is a very private concept to that credit card service to begin with. And your services are basically forced to aggregate it. And that's why I think it's really important when we think about events that we also differentiate the public from private events. Now, there are other terms that are used in the industry to talk about this from domain events and integration events in the DDD space, or events on the inside, events on the outside, external events, internal events. I just really prefer the term public and private, first because it's simple, and also because it really conveys the scope in which they can be used, which is what I care about most, from a bounded context perspective or from a service boundary perspective.
00:16:17 Laila Bougria
So, to solve this problem, what we could do is introduce what is called a summary event. Now, this can then be a public event that is accessible by any part of the system, but it does provide a much less granular view, but is still business-meaningful. So we could have this CardPeriodClosed event that basically tells you, "This is the amount of money that was spent, and this was a period." Done. And basically, we can construct this event from inside our credit card service while subscribing to the PaymentMade, RefundIssued, and CostIncurred event. And that's fine because inside our own service boundary, we do have the context of an understanding of how we should be aggregating those to get a correct CardPeriodClosed event. It also means that we are reducing functional coupling for all of the consumers that are subscribing to CardPeriodClosed because we've encapsulated the concept of a period. And we've also basically kept them from the impact of us maybe introducing additional events or removing additional other events that impact what the amount is at the end of the period.
00:17:33 Laila Bougria
Imagine, for example, that the bank decides, "You know what? Instead of charging everyone for the costs every time you do a withdrawal, we're going to do this at the end of the year." Well, then now, inside the credit card service, we could basically remove the subscription to the CostIncurred event, and the CardPeriodClosed event will not change. Our consumers are in no way impacted, but we still have the autonomy to make a change within our domain without breaking any of our consumers.
00:18:03 Laila Bougria
Now, in this scenario, you could make the choice to make all of those granular events private. That's not always going to work. It's something that you need to ask yourself, whether that is desirable within your system. And then in this case, to make that summary event, the only public event that can be subscribed to, right? And that's a way that we can reduce the change radius and that we can encapsulate those internal domain concepts. It also means that your consumers will subscribe to less events. Therefore, they have less events to process. They have less logic to write. They have less events that could be susceptible to traffic, to latency, and code maintenance, and there's less room for error because of that.
00:18:53 Laila Bougria
So, to recap quickly these private and public events, right? When I think of private events, I think of more granular events. They are also more prone to change, and they can evolve at a higher rate. And that's totally okay. Why? Because they're only consumed within our service boundary. They can require more domain context to consume. Again, that's fine because the consumers will be part of our own service boundary, so they share that context. We do not want to make these accessible outside of the bounded context or outside of our service. And we can also, because of that, share more context because we are in a context where it's okay to share more information.
00:19:37 Laila Bougria
When it comes to public events, much less granular. They need to remain stable over time. And they need to also eliminate that functional coupling. That's the whole idea. Think about the event payload and look at that and say, "In which ways would the consumer have to understand the internals of our service in order to consume this message?" And you want to get to a point where there's nothing really there. You make those accessible to the rest of the system. And they could summarize multiple private events. They don't always have to. Okay?
00:20:09 Laila Bougria
When I talk about this, I always remind people, "Just remind yourself how it is to post a picture on the internet. You can post it and then change your mind and delete it. But you can't really guarantee that it's already been copied by half the internet." And we've seen so many examples of this out in the media, right? But that's really how I want you to think about your public events. Once it's out there, it's going to start living a life on its own. So that's something that you need to think about upfront. "Am I okay with making this data public?"
00:20:45 Laila Bougria
So, let's see what we've achieved, right? Because when we talk about carefully designing our events so that they only have a single meaning, so that they're either public or private, what we basically achieve is a better control over the change radius. Because if we have a private event and that changes, fine, because we control the consumers of those private events. And when it comes to our public events, we are really, really, really careful about what data we expose as part of those public data contracts. And that way, we can basically start facilitating versioning upfront by protecting our service boundaries. It's not just about finding the right service boundaries, as hard as that already is, I know, I know, but it's also about how do we protect those boundaries in the long term when things start to change.
00:21:39 Laila Bougria
But we haven't even solved the entire problem, not even half of it at this point, because both your private and your public events will at some time have to evolve. And then we still have this problem of, "But I have all of these in-flight messages and delayed messages and messages that are in my error queue. How am I supposed to deal with those? You can tell me to upgrade, but I don't want to lose those old messages. That's really the point." So, how do we deal with that? And to do that, I think it's really important that we start to also rethink change as a general concept. Because many times when I talk to people, they're like, "Well, you make a change, and that's it. It's like, at this point in time, we have changed something." But that's not true. It's not a point-in-time event. A change in an event-driven system requires facilitation, and it requires coordination. And there are two angles that we need to consider to facilitate that transition. And that's both a versioning strategy and also an upgrade strategy.
00:22:47 Laila Bougria
And to do that, I want to take a step back and consider how it is that events flow over the wire, when you have one service that's publishing an event and another one that's consuming an event. Because when events are emitted, they are actually encoded into a specific format, right? It's also something we call format coupling. Now, this can be in JSON, it could be in Avro, it could be in Protobuf, in XML, whatever it is, right? Now, that encoding needs to be understood by the consumers of your event. That's why we have format coupling, right? There's an understanding, a shared contract there.
00:23:23 Laila Bougria
Now, most teams leave that open to interpretation. So they're like, "Yeah, it's in JSON," and that's where it ends. And what happens then is that on the consuming side, you basically get teams that start making assumptions on how that data should be interpreted. Like, "If there's an amount in the payload, what precision would this be?" "Well, I guess two numbers after a comma." Or, "What fields are actually required?" "Hmm, I don't know. Well, I guess this one is always around, so I assume it's required." And if there's a field in the payload that's called temperature, what unit is that expressed in? Is it Celsius? Is it Fahrenheit? Or is it actually the temperature of the light and it's expressed in Kelvin, which is completely different? Hmm.
00:24:14 Laila Bougria
The thing is that when this is missing in the payload, you get consumers that start making assumptions and, again, start creating functional coupling there. And you might say, "Ah, this isn't really important. You could use a schema to provide that information, but is it really necessary? It's also just so much overload and so boring." If you're thinking that, let me tell you a story about the Mars Climate Orbiter.
00:24:44 Laila Bougria
Now, this was a robotic space probe launched by NASA in December of '98. And yes, I just read that off my notes because there's no way I'm remembering dates, okay? But basically, what happened is that a year later, what happened is that this spacecraft completely lost communication with NASA on its trajectory. And what happened is that it started to basically deviate to a point where it got so close to the planet Mars that it just shattered into a million thousand pieces and was completely destroyed.
00:25:20 Laila Bougria
So, you might wonder why that happened. Well, guess what? They did an investigation to try to figure out why this happened and what actually failed. And the reason was a mismatch in systems, in the unit systems, because you had one system using the metric system, and the other one was using US customary units. Ah, if only they would have used a schema that actually makes it clear in which specific metric they are actually expressing that unit. Wouldn't that have been great? Then maybe all of this could have been alleviated.
00:25:55 Laila Bougria
So, now that we've established the importance of actually using a schema to convey that type of information, let's take a look at an example of a schema. Now, this is just a simple example. It's in Avro. Even if you've never used that before, it should look familiar because it's JSON. And if you're watching this on your phone, then you might want to rewatch this session after because there will be some things that you will want to verify.
00:26:21 Laila Bougria
This one isn't that important, though. It's just an example to show you we can basically use a schema as a mechanism to structure the data to ensure that it's interpreted accordingly by our consumers. Because what we can do is we can add type information. We can even add complex data structures so that we can understand what would be part of it and what would not be. We can add field-level metadata to indicate that a certain field is required, to indicate what a default value could be for that field, or even to indicate aliases or even descriptions that can help us understand what this specific field is actually representing to begin with. We can also provide rules on how that data is formed.
00:27:09 Laila Bougria
But schemas are also a very powerful mechanism to facilitate versioning. And that's the angle where I'm coming from today. Because once you have a schema for your payloads, then you can also define a versioning strategy. Now, a versioning strategy sets clear expectations to all of the consumers of your events on how they can expect that event payload to evolve. And it allows you to also make those changes predictable to consumers in a way that doesn't always break them, that in the majority of the cases will not break them.
00:27:50 Laila Bougria
So, let's take a look at the possible versioning strategies that are out there. First is forward compatibility. Now, when you use a forward compatibility strategy, we're basically saying, "You know what? You can delete any optional fields and you can add fields. Whether they're required or optional doesn't even matter. You can add any type of field." And this is usually used in traditional brokers like an Azure Service Bus, like a RabbitMQ, like an Amazon SQS, and things like that. And what it means is that consumers can basically use an older version of the schema, previous version of that schema, and still be able to process events that were constructed with a newer version of the schema. In this mode, our producers will start emitting events under a new version of the schema without breaking our consumers.
00:28:48 Laila Bougria
Now, let's look at this with an example to basically make it a little bit clearer. Now, let's say we have a first version of a schema and that contains a bunch of attributes. And both our producer and our consumer are using that same schema. Everything is good in the world, right? But then a change happens and we introduce this new version of the schema. Our producer starts to use this to emit events from a certain point in time. What happened in this specific example is in that v1, we had a currency that was optional. It was allowed for it to not have a value. In v2, we've removed that altogether. So, if we have a consumer that is using the old version of the schema, it will still be able to process any messages that were constructed with v2 of the schema. Currency will never be there, but that's fine. It could already deal with that because it allowed for it to not be there to begin with. Then, when the consumers are ready, they can start to upgrade.
00:29:55 Laila Bougria
Again, v3 is introduced. At this point in time, we're doing something else and we are introducing that customerId. So our producer upgrades to that new version of the schema and starts to emit events that contain a customerId. In this case, it's even required, because in Avro, fields are required by default.
00:30:15 Laila Bougria
Now, you have this subscriber that is still using v2 of the schema in which they don't recognize customerId. Fine, they're not going to break. There's a value there. Just bluntly ignore it. But we're not going to break when consuming that message. And that's what's important, okay?
00:30:36 Laila Bougria
Quick drink, because there's a lot I want to say.
00:30:39 Laila Bougria
Now, when you look at this from the infrastructure perspective that can provide a different view on things... So, what would happen is, when we have forward compatibility, your publisher can switch from v1 to v2 and publish those events. And your consumers can still hold on to v1 of the schema and successfully be able to basically consume messages that were constructed with v2 of the schema using exactly the same consuming code. So they won't immediately experience any impact. Nothing needs to immediately change. Of course, if they want to start using some of the fields that were added, well, they will have to opt in to a newer version of the schema. All right? But it doesn't break them. That's the point.
00:31:31 Laila Bougria
Another option is backward compatibility. And in this mode, we allow for the deletion of fields, both required and optional, and we allow you to add optional fields. Now, in this case, our consumers can use the next version of a schema to process messages that were constructed with an older or the previous version of a schema. Now, this is popular. I see a lot in... people who are using Kafka or event stores in which they are saving them as they are coming in so that they can still replay them. If you want to be able to do message replay, then you will have varying versions of that schema and you want to be able to move forward.
00:32:18 Laila Bougria
So, let's look at an example. In this case, you could argue that consumers upgrade first. Although they use that terminology, I don't think it's useful. So let's use an example instead. We have v1 of a schema. Producer and consumer both use that schema. Everything is good in the world, right? But at that point, we introduce v2, which introduces an optional field called currency. Now, our consumer is now using that v2 of the schema, but it still has messages around in the system that were constructed with v1. Okay, if I want to consume a message that doesn't contain currency, I can do that because it's currently optional.
00:33:02 Laila Bougria
Again, we get to a point where everyone is upgraded and we introduce another version. Again, we have the subscriber who's running up front and with messages that were constructed with an older version of the schema. In this case, what we did is we removed the periodTo attribute. Well, we have some events that still contain that. Fine. We'll just ignore it when we basically consume that message. And that way, we are not making breaking changes. We are making compatible changes that don't immediately break your consumers in a way that they have to do a code fix and deploy to be able to move forward.
00:33:42 Laila Bougria
From the infrastructure perspective, to visualize it, you would have, let's say, a queue, or whatever it is, or a stream or just an event store that contains a bunch of messages that were constructed with v1 of the schema. You might have a consumer that is using v2 of the schema, a newer version, and is still able to use that schema to deserialize and consume messages that were constructed with a v1. Are you still with me? I know this is complex to sort of understand, and that's why I have a bunch of visuals. So, definitely this is a part to re-watch.
00:34:21 Laila Bougria
Another option is to have full compatibility. Now, this is the most restrictive mode in the sense that you can both only delete optional fields and only add optional fields. So you can't really play around with anything that is required. And in this case, what happens is that your consumers can use both the old and the new version of the schema while processing messages that were constructed with an older or new version of the schema. And that sounds so confusing. So, let me put it like this. I'm a consumer and I am using v2 of the schema. Well, I can use this schema to process events that were constructed with v1 of the schema, with v2 of the schema, and with v3 of the schema, because I kind of have both forward and backward compatibility. All right? You get full compatibility.
00:35:14 Laila Bougria
But finally, there's also this concept of transitive compatibility, and this is not really a compatibility mode unto itself, because any compatibility mode that we talked about, forward, backward, and full, can also be made transitive. Now, we've been talking about next and previous versions of the schema, and that's important, because compatibility applies only to the previous or the next version, unless you make it transitive. When you're saying, "I have transitive full compatibility," then it applies to all of the versions of the schema.
00:35:53 Laila Bougria
Now, this is extremely restricting, so to speak, right? But this can be very helpful if you have systems in which the events are very, very long-lived. They're stored in some kind of an event store or data lake. And you don't want to modify them. You basically want to be able to consume them, even if they span multiple versions of a schema. Well, then this is what you kind of want. You want that type of transitive compatibility. It could also be that you're using forward compatibility in a broker like Azure Service Bus, but it's such a quickly evolving system that your in-flight messages might have multiple versions of the schema that's being used. Then making that compatibility strategy transitive can also be a way to deal with that.
00:36:41 Laila Bougria
And that rounds up sort of the compatibility strategies. But before we move on, I do also want to point out a common misconception. And that's the idea that schema versioning is not at all the same as semantic versioning. And I see these two concepts being confused all the time. Now, the thing is that semantic versioning, or SemVer, for the people who know it by that name, is all about APIs. And where it gets really confusing is because you can express data contracts, payloads, and stuff like that in an API, in a POCO, in a class, with properties. And that's where it gets confusing for a lot of people. Because, depending on your compatibility mode, deleting an attribute that's part of your payload is not a breaking change. But if you delete a property in a public class, according to SemVer, that is a breaking change. So, those things are not the same thing. And it's really important that we differentiate it because schema versioning strategies do not adhere to semantic versioning.
00:38:02 Laila Bougria
Now, I see many, many, many folks out there who basically express their event contracts in POCOs and then have them in NuGet packages and share them in between producers and consumers, right? And this is where this misconception happens. So, when you have a schema versioning strategy that you're applying and you're using NuGet packages to express those data contracts, that NuGet package should not adhere to SemVer because then you are basically conflating two different concepts. Okay?
00:38:37 Laila Bougria
When you think about schema compatibility, it's really all about the ability to read the data. I want to be able to receive that message and deserialize it and look at the payload and not become blocked there. The whole goal is to avoid poison messages. And adhering to a compatibility mode avoids breaking your consumers. It allows you some flexibility to make changes without causing this type of disruption to your consumers. Okay?
00:39:10 Laila Bougria
Now, for example, it's also important to understand that you can remain compatible and it doesn't mean that you can enjoy all of the additions without upgrading, right? So, let's say that you say, "I have a field and I want to rename it," right? "I want to rename this field from A to B." What that really means from a schema perspective, assuming that that was an optional field to begin with, then you can delete A and add B. For your consumers, that might not be a breaking change because A was already optional. However, in order for them to start using B, they do have to upgrade to a new version. But they won't be immediately broken. So, that's a differentiation to make.
00:39:57 Laila Bougria
Now, we've been talking quite a while about schemas. But that's not really the only thing that can change in a payload, right? I want you, everyone in the chat, start thinking about this. When you put an event out there, you publish it or you send a message, what can change? We've talked about the payload, but what else can change? Because the payload is just part of what is put on the wire. There's something else that travels with that schema, so to speak, on the wire, and that's metadata, right? The message headers. They are also part of what travels on the wire, and making changes to metadata or message headers can also equally break your consumers. Because metadata can be relied on by intermediaries for routing purposes, "Oh, if it's this type, then I'm going to send it over there," or even consumers for processing purposes. Or it could be relied on by both intermediaries and consumers for filtering purposes. So, it's also really important to think about that metadata. And we also want to structure that part of our payload accordingly. And that's exactly where CloudEvents come in.
00:41:13 Laila Bougria
Now, CloudEvents is a specification that standardizes the envelope of the message, of the event that you are emitting across your system. It's a specification that was developed under the CNCF, which stands for Cloud Native Computing Foundation. I always have to sort of think about, "What was it again? What was it again?" I just always say CNCF because so much easier. Cloud Native Computing Foundation, right? What they basically did is that they standardized the expected metadata for every message that is flowing through a system.
00:41:48 Laila Bougria
And it introduces seven metadata attributes or message headers, if you will. Four of them are required. The id, something that uniquely identifies this specific event that is flowing through the system. The type of event. The specversion. And the specversion refers to the CloudEvents specification version. That should remain mostly single. But also the source, where does this event come from, which service did emit this event, so that you're able to recognize that.
00:42:17 Laila Bougria
And there are three additional message headers that you can use, but you don't have to. One of them is the dataschema. Remember schema we've been talking about? The OrderPlaced event looks like this, has these attributes. Well, the dataschema header allows you to point at a specific schema that your consumers can expect when they consume that message. Also contains the datacontenttype. Is this XML? Is this JSON, Avro? Whatever it is, right? But also the time at which this event was constructed.
00:42:52 Laila Bougria
Now, CloudEvents also supports two modes, structured and binary. I won't go into that too much, but I want you to be aware of it so you can look it up further. I have a bunch of resources also at the end.
00:43:02 Laila Bougria
What I also want to give you and make you understand is that the CloudEvents specification is completely protocol-agnostic, but it does support multiple or does provide multiple protocol bindings, like HTTP and AMQP and MQTT and NATS and you name it. There's a bunch of them. So there's a lot of support out there.
00:43:24 Laila Bougria
But what, to me, from a versioning perspective and from a structuring perspective, is really important is that CloudEvents provides an extension model that allows you to define attributes that are relevant within your system boundary. So, we talked about the message headers that CloudEvents defines for you. But maybe in your system, you're saying, "Well, we kind of add a user ID message header to every event that we emit in the system. And that is required, and it should be a GUID." Okay. Then using CloudEvents extension model, you can define that specific message header and also structure that it has to be required and that it has to be a unique identifier, right? And this way, it allows you to structure the entire payload, both the data and also the message headers or the metadata that is flowing throughout your system.
00:44:23 Laila Bougria
But what if the metadata then needs to change? Because we talked a lot about how the schema can change. And in order to understand that, let's just circle back for a moment to how metadata can be used. Because it could be relied on intermediaries for routing, for processing, or for filtering, right? And then the question becomes, "Can you really change the metadata?" I mean, maybe you can change a harmless description here or there, but a significant change in the metadata, like changing the attribute or the header name, can cause havoc in your systems, right? Because these intermediaries and consumers are also relying on that information. Even if you think about the addition of a property, the addition of an attribute, of a header, can also be disruptive. Because, okay, it's not going to immediately break someone. But if your expectation as a producer is that that header is taken into account, then they will need to account for that. And therefore, it's best to think of any change in your metadata as a breaking change. And that warrants a new event type.
00:45:36 Laila Bougria
Now, whether it's a change in your metadata, in your message headers, or in your dataschema, sometimes we do need those breaking changes. Even if you're using a compatibility strategy for your schemas, sometimes you're like, "I'm sorry, the change is so impactful that I can't adhere to this compatibility strategy. I'm going to have to make a breaking change." So, what do we do then?
00:46:00 Laila Bougria
Well, the first thing we need to do is deprecate the old version and set a deadline on which that version will be completely deleted and non-existent in the system. We need to notify our consumers. And within that deprecation window, from which we say, "We're deprecating it," to where we're removing it, we need to do something called dual publishing. That means that you are going to emit both the older version of that event type and the newer version of that event type, which, to be clear, is a different event type, not just a new version, okay? Now, once the deprecation window is over, you stop publishing the old event type. You just remove it from the system altogether.
00:46:47 Laila Bougria
Now, in order to implement this, it's kind of impossible to go through all of the possible scenarios because it depends on the broker that you're using, compatibility strategy, and so many other things. So I want to discuss an option with Azure Service Bus, which I know many of you commonly use, in a situation where your messages are short-lived. So you're not storing them in an event store. You just consume, delete, and it disappears into your business data in a different form. So we're using a forward compatibility mode.
00:47:19 Laila Bougria
Now, what happens at this point is, our publisher is now publishing this vA of the message. This is the event type. And we have multiple subscribers. Now, when you use Azure Service Bus, every subscriber gets its own virtual subqueue. So each of them has in-flight messages there. Everything is good in the world. Now we've decided that we have a completely breaking change and we need this whole new type of event, right? So, at this point, what happens is the dual publishing window in which you said, "The deadline is three, six, nine months from now," whatever makes sense within your system boundaries, okay? And you say, "Within that period, I'm going to publish vA and vB." Now, in this specific case, we chose to have a topic-per-event-type type of topology, right? So every event type gets its own topic. So that means that the vB messages go to a separate topic. Now, we continue to publish both of them, which gives our consumers the time to upgrade when it fits their schedule, as long as it is within the deprecation window.
00:48:28 Laila Bougria
Now, let's say that we have our subscriber number one, and they do want to upgrade. But they, of course, don't want to lose all of those in-flight messages that are still in that subqueue. We don't want to lose them, of course not, because then our system would be inconsistent. So, the way that we go about this is, first... And the order is important, so pay attention. First, we want to subscribe to this new topic, to this vB topic, right? And then we get our own subqueue, we start receiving messages there. And as close as possible to that operation, we want to unsubscribe from that topic number A. Now, that means that our in-flight messages are not deleted. It's just a subscription that is gone, which means that new messages will stop flowing into that subqueue, but we still maintain the in-flight messages that we had.
00:49:23 Laila Bougria
Now, if you change the order of these operations and you're saying, "I'm going to unsubscribe from topic A first and subscribe to topic B first," in a high-throughput system, that can lead to message loss because those operations are not atomic. There's no transaction around these types of operations, which means that there might be split seconds in between where messages are coming in and you're not receiving them, not on the topic A, not on the topic B. That's why it's important to first subscribe to the new topic, then unsubscribe from the old one. But of course, you can see it coming, the reverse is also true. If I'm not losing messages, well, I might be getting duplicate messages now. And this is why it's also important that you use a logical message ID to deduplicate the ones that have the same meaning, right?
00:50:20 Laila Bougria
Now, okay, once we've basically completely emptied that subqueue, those in-flight messages, they're all gone, we can see there's nothing there, at that point, it is safe to remove the handler code that we had for the event type vA. And we just are fully now upgraded to this new event type. And once the entire system has been able to upgrade it and we reach our deadline, then the publisher can say, "Okay, we're done. We waited long enough. We assume everyone in the system has upgraded. We stop publishing event type vA. It doesn't exist for us anymore." And now we've basically closed the circle on schema versioning and upgrading strategies.
00:51:04 Laila Bougria
But the thing is that all of this, although it sounds really great, it's nothing but a promise. It's an informal handshake-based contract, cross-my-heart-and-hope-to-die type of thing. But these promises can easily be broken, not always because we have bad intentions or we don't care, but mostly just by accident. But that can cause massive issues. I really like this quote by Clemens Vasters, where he said that "Distributed systems are hard enough while being disciplined about sticking to the promises that we make. And they turn into absolute chaos when breaking those promises becomes easy."
00:51:45 Laila Bougria
So, it's important that we find ways to force ourselves to keep our promises. And how do we do that? Well, for starters, we could use a schema registry. I would like to hear in the comments who of you has heard of this before, who of you is using one, and which one you're using. Drop me a comment. I'll definitely scroll back.
00:52:04 Laila Bougria
But basically, schema registries give you a centralized approach to schema evolution. That is, basically gives you a central place where you can go and see what is the schema. Both producers and consumers are able to access that information. That also means that you're basically putting your money where your mouth is because schema registries have the ability to enforce your compatibility strategy. If you are saying, "Okay, it's forward compatibility," well, then it will basically not let you make changes that are not forward compatible. It means that you don't have to include your schemas in your payloads, duplicate them across consumers, have these shared NuGet packages, because it gives you that central place. And as I mentioned, it takes that compatibility promise and it formalizes it.
00:52:58 Laila Bougria
Now, what I also really like about this is that you could even take it a step further and say, "You know what? Before I emit a message, before I produce it, I'm going to query the schema and enforce it so that I never really put out a poison message again." Or on the other side, "When I receive a message, I'm going to query the schema registry to ensure that it adheres to the schema, or I'm just going to push it out because I don't know how to consume that," right? So you get runtime validation of your payloads.
00:53:33 Laila Bougria
But there's one big problem with the schema registry options that exist out in the industry today, and that's that they are very tightly coupled to specific brokers, to specific protocols. Now, I'm wondering, who of you is using something like Azure Service Bus or RabbitMQ or SQS? And those of you who are, have you ever used a schema registry? And I'm willing to bet that the answer is no. Because the thing is that schema registries are not a new concept, right? They are actually a well-known and very much utilized concept in the Kafka space or people who are using Azure Event Hubs or something like that, because those things tend to be tightly coupled together. Even in Azure, when you create an Event Hubs namespace, then you get a schema registry feature. But if I'm using Azure Service Bus, I can't really access a schema registry feature. I would need an Event Hubs namespace, which I'm not going to pay for if I'm not using that broker. It doesn't really make sense, right? So it's not really accessible outside of that. But I have high hopes that that is about to change with the introduction of xRegistry.
00:54:49 Laila Bougria
Now, xRegistry is a set of specifications that provide a lot of guidance on how to define metadata. It's being developed, again, under the CNCF, Cloud Native Computing Foundation. Now I know for sure. And it's actually being developed by the same group that has developed CloudEvents. I've been joining that group for half a year, no, actually, nearing a year by now at this point, and participating in that work. But at its core, I want to also clarify that xRegistry is not at all coupled to messaging. It's actually applicable for any type of information flow where you think it's important to define any catalog of information that is being shared between parties where you want a central registry. But in the context of messaging, which is what we are talking about here today, it does provide multiple additional specifications that can help you categorize endpoints, message definitions, the envelope, message headers, and schemas.
00:55:57 Laila Bougria
So, it gives you this foundation for a centralized registry, not only of schemas like the existing schema registry do, but also for your endpoints and your message definitions. So it goes way beyond what the existing schema registries offer today. It is also protocol-, broker-, and vendor-agnostic. Again, multiple protocol bindings. You have the ability to use it with multiple protocols, with multiple brokers, but they are agnostic to those individual implementations. And it's very, very useful for discovery as well. So you can go to a specific registry, look, "Oh, there's a sales endpoint. What does that sales endpoint emit? Oh, it's actually emitting an OrderPlaced event. Oh, and what's the schema of that OrderPlaced event? Well, there you have it." So, you basically get all of that information in a single place.
00:56:51 Laila Bougria
Now, you can already use xRegistry today because part of the group's work has been to develop a server that you can run in a Docker container and already utilize today. But I'm hoping that existing schema registries out there or even completely new servers will basically implement this specification. But even you can use this today because you can host a registry in a file. And you can host that file next to your code in GitHub or put it on S3 or Blob Storage or whatever it is.
00:57:24 Laila Bougria
Now, to make this a little bit visually also understandable for you, you basically have the concept of a registry. And that contains multiple groups. And each individual group can have then the resources of a specific type, which can also be versioned, right? If we translate this into our messaging space, a registry contains endpoints, which basically represent a group type in the specification. That endpoint adds channel information, protocol information, or even the envelope that is used for the messages that flow inside of that endpoint. It also has a collection of resources, which in this case are message definitions. "Oh, I have an OrderPlaced event. Oh, I have an OrderPaid event." And this is the metadata that those events carry. Now, a message definition can only have a single version.
00:58:20 Laila Bougria
On the other hand, we also have schema groups. And schema groups can have multiple related schemas that are defined together. For example, the payload for the OrderPlaced event, right? What is the schema for the content of that type of a message? Now, that can have multiple versions as well. And what's really nice is that specification also allows for cross-linking. So you can have a message definition for the OrderPlaced event that points to the schema that we can expect for that specific event when we are consuming it.
00:58:57 Laila Bougria
So, how does this get relevant for versioning specifically? Well, it gives you this single point of truth for endpoints, for messages, and for your schemas. It also defines compatibility in its core specifications, which is one of the parts I was heavily involved in as well. And it can, therefore, be used to query on egress when you produce a message so that you can avoid poison messages. Not only make sure that the schema is correct, but make sure that your message headers are correct, make sure that you're the appropriate service that's going to emit that event as well. So you get basically that capability for both the schema and the metadata. And that way, we can also facilitate the evolution of our events through its deprecation model.
00:59:46 Laila Bougria
Now, remember earlier I said, "Well, when we make a breaking change, what we need to do is mark the event as deprecated, set a deprecation date." We're going to effectively remove it after that date. That's not going to change. But we kind of also want to notify our consumers and make them aware that "Well, this event type is now replaced by this new event type." And we want to have some documentation around that. And I kind of just danced around that as if that is easy. But that's the hardest part, because think about it. When we are a producer, we don't want to know our consumers. That's the whole decoupling we're looking for. We don't know who our consumers are. They might be inside the system, outside the system. So how are we supposed to notify them when there is a breaking change?
01:00:35 Laila Bougria
And this is something that xRegistry also helps with. Why? Because it contains CloudEvent definitions that can notify of changes. So, as a consumer, I can say, "I care about these message definitions that are part of this endpoint. And whenever a version is added, I want to know that. Whenever a schema is deprecated, I want to know that. Whenever a message is deprecated, I want to know that." And now you can keep that decoupling between your producers and your consumers. Because any xRegistry-compliant server can emit these events, allowing your consumers to be notified without coupling them together.
01:01:16 Laila Bougria
And that brings me to the end. Wow. I know it was a lot. You probably want to rewatch this. But I do want to recap to give you away the main points that I don't want you to forget. The first one is that facilitating versioning starts with designing your events appropriately. Make them granular, make them business-meaningful, and basically, differentiate public from private events.
01:01:42 Laila Bougria
Use schemas and define a versioning strategy for your schemas early on. Make it visible to the rest of the system how they can expect these things to evolve over time. Also, define the event metadata with CloudEvents. You have the preset attributes that are there. If you have any additional ones, you can use the extension model.
01:02:03 Laila Bougria
And you can enforce the compatibility strategy that you selected using a schema registry or an xRegistry that also enforces the schemas for you with its compatibility mechanism.
01:02:18 Laila Bougria
Have a breaking changes upgrade strategy in place. It should be documented. How are we supposed to upgrade something when we completely change the event type, given our broker, our topology, and our compatibility mechanism as well?
01:02:35 Laila Bougria
And notify breaking changes to your consumers early on. The earlier that they know it, the more flexibility you can give them in upgrading and making this a friction-free type of experience.
01:02:47 Laila Bougria
And that was it for me. I hope you enjoyed it. I have a bunch of resources, as always, behind this GitHub QR code. And I'm looking forward to hearing any of your questions.
01:03:00 Matt Ellis
Wow, what a fantastic session to kick things off with. There was a ton of great content in that, Laila. Thank you very much.
01:03:07 Laila Bougria
Thank you.
01:03:09 Matt Ellis
Lots of really interesting ways of thinking about things, which has really sparked a bunch of ideas, I've just got to say. There's a whole number of things, well, just simple things as well, like good design being important with your event types, sort of encapsulation, almost sort of normalization as well of what the data is going to be. And I love the Mars lander story as well. That's such a good-
01:03:36 Laila Bougria
I had to include that.
01:03:39 Matt Ellis
Yeah, yeah. But it's such a good lesson, isn't it, in making a mess of things.
01:03:43 Laila Bougria
Absolutely.
01:03:43 Matt Ellis
01:03:47 Mehul Harry
You put together so much great content in there. So, when going through it, what was some of probably your bits that stand out where you're like, either, "That was something new to me," or, "That was something very interesting"? I know there's a lot of new stuff you're working on with the CNCF on those specs and all that kind of stuff. But what... Is it the CloudEvents that looks very promising to you? What kind of stands out to you in sort of this new frontier horizon?
01:04:19 Laila Bougria
That's a good question. I think, really, it's the combination of things, right? Also, it's not a coincidence that the xRegistry specifications have been developed by the same group who have been working on CloudEvents, because they also saw that CloudEvents is a foundation, the first step that we need in order to be able to formalize just the message envelope more. And then we can take it a step further and say, "Okay, but the schemas are also important, and there are solutions out there, but they only solve part of the problem. So we actually want to solve the bigger picture. And we want to have both discovery and validation techniques that make it available to solve all of the problems that we face," because the whole idea of event-driven systems is to have subsystems that are decoupled.
01:05:09 Laila Bougria
But that doesn't come without a cost. It's, first of all, very hard to achieve. And second of all, it does, yeah, create these sort of situations in which you're like, "Okay, but I have a bunch of consumers, and I don't know who they are." That's the point of having a decoupled system, right? But that also has repercussions when you try to evolve that system. And I think these things combined are bringing an answer to those types of problems that people have been running into the wall year after year, me included. So, this is bringing together a lot of learnings of not just the last period with the CNCF but from many, many, many years building these types of systems.
01:05:49 Mehul Harry
Yeah.
01:05:51 Matt Ellis
Speaking of the CloudEvents as well, we ran a little poll while it was running there, and it was pretty evenly split, really, with people who are either not using or have never heard of CloudEvents. There's only about 10% of the respondents using them. Is this one of the things where you'd strongly advise, it's like, "You folks should look at this"?
01:06:13 Laila Bougria
Well, yes, especially if you have a large system in which you may have multiple participants, you have a very large organization, and you can't just rely on communication between teams to solve these types of problems, which you could arguably say that those are the best environments to build distributed systems in, then definitely this is something you should be looking into at least to see how this can help facilitate basically maintaining your distributed systems in the long term, for sure. Because it helps create that structure and it helps to create expectable things. That's all that really matters is that we know what we can expect when we are subscribing to a specific event. We want that information to be stable. And if we want something to be stable, then we need to structure it to begin with. So, that's definitely the first step, for sure.
01:07:08 Matt Ellis
Yeah, yeah, yeah, cool. With the other poll, by the way, only about a third of viewers are differentiating between public and private events. And that whole thing sparked a really interesting set of conversations and questions about public-private events.
01:07:22 Matt Ellis
Mehul, have you got any other questions from the chat there, by any chance?
01:07:27 Mehul Harry
There are some. Somebody, they're just mentioning some technologies between Kafka and MassTransit and all that good stuff. I'd just recommend, Laila, if you have a moment, hang out in the chat for a little bit. Folks have been actually... What's great about these chats is they're kind of talking also amongst themselves.
01:07:48 Mehul Harry
One thing that I was kind of thinking is, as I'm watching all the stuff, there is an overall meta message of just change, right? I like what you said, change is not a point in time, which is right, because change is a process. It takes time. It's more about the decision when we make the change that's always in our mind, right?
01:08:06 Laila Bougria
Right.
01:08:06 Mehul Harry
But you even said change is inevitable. Is it just that in this event-driven space that folks are not thinking enough about these problems and how to address them, which is why you're kind of saying, "Look," which is why new specs are coming out and all that kind of stuff? Because I would think this space, it's still getting new specs, right? That's also surprised me.
01:08:35 Laila Bougria
Oh, okay. Yeah, okay. So, distributed systems aren't new. That's for sure. Messaging is also not a new concept, at all. MSMQ has been around for how long? I don't even know. And even before that, right? It's like, messaging is really the concept of sending someone a letter. So it does even exist even outside the software industry as well. It's all about this asynchronicity of communication.
01:09:00 Laila Bougria
But I think as systems have evolved and have started using more and more these types of capabilities, we've also been running into the pain points of that how it affects your organization, but also how do we really basically build decoupled systems? Because the thing is that I always have a bit of an allergic reaction because when you look at how brokers are marketed, then you basically see them being marketed as an infrastructure that allows you to build... that basically gives you decoupling, right? But that's not true. You can use messaging and have a tightly coupled mess that is then distributed. That's absolutely possible. The only real decoupling you get out of the box when you use a broker is temporal coupling. So basically because it introduces that asynchronicity. But there are all other forms of coupling that we are still basically susceptible to.
01:10:01 Laila Bougria
And that's why I, by the way, for those of you who are listening, I have a whole series on my LinkedIn where I talk through all different types of coupling and how we can basically even identify that type of coupling, how are we supposed to deal with that. But we are definitely still susceptible to all of that coupling and not safe from not having a tightly coupled mess.
01:10:25 Laila Bougria
So, I think many of those things, we are figuring it out. The whole idea of getting the right service boundary is an incredibly difficult exercise. It requires you to have access to business experts to be able to understand how the domain works. That in itself is a massive challenge. That's why I always say, as engineers, we are not just technical people. We need to be able to understand the business or we won't be able to find the right service boundaries because those things are tightly coupled together, if that makes sense.
01:10:57 Matt Ellis
Yeah, absolutely. And I wish we had a bit more time to dive into it. There was a great question on the chat. So if you get the chance to pop back in. But it was-
01:11:03 Laila Bougria
Yes.
01:11:04 Matt Ellis
... asking essentially, the service boundaries is really useful. It was about talking about service boundaries and essentially what makes something public or private. If you've got no external consumers, how do you have a public event? Everything is private-
01:11:21 Laila Bougria
If you have no external consumers?
01:11:23 Matt Ellis
Yeah. So if it's only your own stuff. But all of these concerns are still going to be there, aren't they? With versioning and problems and change, but it's just internal.
01:11:28 Laila Bougria
Absolutely. Yes, yes. Yes, it's just internal. And then you get less friction because you control the consumers, right? So then you would be able to basically say, "Hey, you have to take care of that and that and that." And then you don't have that sort of type of concern. You still have the upgrade concern, like we talked about, "Okay, now this goes to a new topic and we have in-flight messages." That problem still exists. But then maybe basically those schema versioning strategies are a little bit less important. I'm not going to say irrelevant because I don't believe that to be true. But yeah, of course, depending on the organization, the scope of the system, these decisions are going to be different. But yeah, I'm talking about large distributed systems and how those problems tend to be solved in those scenarios.
01:12:21 Matt Ellis
Yeah. I'm just thinking of some teams I've worked with and how trying to get schemas changed across multiple teams like that can still have a lot of friction.
01:12:30 Laila Bougria
Absolutely, yes.
01:12:32 Matt Ellis
Yes. I could probably talk a whole lot more about this, but I think we've run out of time. And I guess it's time to thank you very much for joining us. It was a brilliant session.
01:12:40 Laila Bougria
Thank you for having me.
01:12:42 Matt Ellis
As I say, it sparked a lot of interesting conversation in the chat.
01:12:45 Laila Bougria
I'd definitely go check it out.
01:12:47 Matt Ellis
Yeah, please do. Thank you very much. And-
01:12:50 Laila Bougria
And please reach out to me online. So everyone who still has questions, send me a message on LinkedIn. That's probably where you can get a hold of me most quickly.
01:12:57 Matt Ellis
Yeah, absolutely. Thank you very much, then, Laila. And we'll see you again sometime.
01:13:01 Laila Bougria
Bye-bye.