CQRS pitfalls and patterns

00:02 Udi Dahan

Hello, everyone, and welcome. Hope you had a good lunch. Good seeing you all again. It's been a while since I've been in Oslo. I was at the last NDC Oslo as well. Did any of you see me last time I was over here? Yes? No? Okay, maybe. Some of you don't remember. That hurts. I didn't think I was so forgettable. My name is Udi Dahan, by the way. I'm @udidahan on all of the socials, Twitters, et cetera. And we're going to be talking today about this thing called CQRS, Command Query Responsibility Segregation, pitfalls and patterns related to that. So just to get a sense of where all you are at. Who's heard of CQRS before? Yes? Okay. Who is it the first time that they're hearing about CQRS, essentially you're being dragged here by somebody else and you need to hear about this? Okay. So good news is most of you have already passed the first pitfall, which is not knowing about CQRS.

01:07 Udi Dahan

So as two very smart fellows, Sherlock Holmes and Dr. Who have mentioned, when they don't know a thing, they don't like not knowing about a thing. So for all those of you who don't like not knowing about CQRS, you've come to the right place. Now the second pitfall is overusing. So whether it's vitamins or antibiotics or any kind of good thing in life, there is an appropriate level of use and then there's overuse where it becomes toxic and bad for us. Now, I know we've just kind of launched immediately into here's a pitfall, here's another pitfall. Let's take a step back and ask the question of what would you say CQRS is anyway? Because one of those acronyms in software or terms, microservices is another one, that there's a whole lot of definitions and different flavors of the thing out there, and that actually causes a lot of difficulty.

02:15 Udi Dahan

So who was at Sam Newman's talk about the asynchrony one? Yeah? Okay, A bunch of you have. He kind of made that emphasis saying it's important for us to have a shared language and agreed upon terms and meanings to things so that ultimately we can talk and collaborate with each other so that when I say CQRS and you hear CQRS, we both know we're talking about the same thing. Now, just to frame it in terms of, well, other things that already exist in our industry is CQRS, the best way to characterize it compared to other things is that CQRS is an architectural style. Now for some of you, you're like, "Oh, yes. Of course. Obviously," and some of you are saying, "Oh, architectural style. That's the first I've heard of architectural styles. Would you do me a favor and explain to me what an architectural style is?"

03:11 Udi Dahan

So the thing about our industry is we use the term architecture a lot. We might do object-oriented architecture, layered architecture, service-oriented architecture. We use that end term, architecture, after other things. So again, service-oriented, event-driven architecture. All of those things are not actually architectures, they are architectural styles, and the best way to think of them is it's just a way of organizing things in software. So we've got the towels, which are nice and neatly layered, one on top of the other, however, when it comes to detergent, detergent is not the kind of thing that you would try to layer one on top of the other. A jar would probably be a better way of organizing that thing. And you got baskets and buckets and whatever. So architectural styles, ways of organizing things, CQRS as an architectural style, CQRS as a way of organizing some things in software.

04:16 Udi Dahan

So you've mentioned already a bunch of other architectural styles. You might have been familiar with model view controller, MVC. Also, a way of organizing things. It's a kind of architectural style. REST also is another one. Representational state transfer. A way of organizing these things. Now the important thing about all of these architectural styles to remember is they are not best practices. That's important enough that I'm going to say that again because we... Oh my God, do we love best practices. As an industry, we've got so many best practices, it's surprising we don't have any pretty good practices or not very good practices. We've got anti-patterns. Yeah, we know those know those, right? Got plenty of anti-patterns, right? The bad stuff, don't do that. Then this chasm of nothingness, and then over here we've got a pile of best practices.

05:23 Udi Dahan

When you think about most things in life that there isn't that really strong dichotomy between the best and the absolute worst. So when we're talking about architectural styles, again, whether it's layering, whether it's MVC, whether it's event-driven architecture or CQRS, none of them are best practices unto themselves. They're tools. Use them when and where they are appropriate, and that's where we sort of get into it. Say, for a given system of any kind of scale complexity, you're probably going to use a bunch of them, and that's what makes it tricky because we want the best practice. Oh, layered architecture's the best practice? Okay, everything's going into layers. Simple, straightforward, just layer all of the things and project will be a success. That didn't work. Well, layering now sucks. It's in the worst practices bucket.

06:19 Udi Dahan

What's the next thing? Event-driven architecture. Yes, event-driven. Everything will be an event and then we event all of the things and we event source and we event drive and we event-orient and then we run into problems with too many events and event schema changes and then we say, "That event thing, that's crap. Nope, throw it back over there." Please don't do that, all right? You got screwdrivers, you got hammers, you got saws. We use all of these tools for various different things. Same thing when it comes to architectural styles.

06:52 Udi Dahan

So let's go into CQRS. What do we even mean about it? Now that we can say, okay, it's an architectural style, we have an idea what other things are out there that are architectural styles that it relates to. Where does it come from? Well, before we had CQRS, we had this thing called Command Query Separation, all right? Now this was coined by a very smart fellow, Bertrand Meyer, back in 1994 when he wrote a book and he described it as a programming principle that states that every method on a given class should either be a command which is performing some kind of action, or alternatively a query, which is returning data to the caller, but most importantly, not both. So there should never be a situation where you're querying something and then that changes its state. So hopefully most of you when you're programming don't do that.

07:53 Udi Dahan

Like I said, this is 1994, close to 30 years ago. At the programming level, we've come a fairly long way in terms of the programming practices, but when we start looking at them from an architectural perspective, that's where applying those things gets a little bit tricky. Now, another challenge we have as an industry, and it's really unfortunate, is that things that are old, we kind of forget them rather quickly. We like the new stuff. Oh, yes. Absolutely. Rust? Yes, totally. Let's rewrite everything in Rust. It will be faster that way we're sure.

08:34 Udi Dahan

But the ideas of 30 years ago, 40 years ago from software, those aren't new and shiny, don't get accepted to speak at a conference about old type of topics. So that's where we started saying Command Query Separation, it's a good idea at the programming level, and then about a decade went by, we totally forgot about Command Query Separation. These books then came along, Patterns of Enterprise Application Architecture, Domain-Driven Design. Just a quick show of hands, who's read these books before, yes? Okay, I'm going to be a little bit difficult with you. Who read them all the way through?Okay, I'm seeing a few more hands kind of be like, "Read half. Does that count?" Now, the reading it all the way through, it's kind of like, "Yeah, there's some good stuff in there. I think I understood some of it, and then again, sort of promptly forget the rest of it."

09:34 Udi Dahan

So since those books came along as an industry, we said, "Let's go do domain models." Who's got a domain model in their system? Yes, all the hands going up. Of course we have domain models. It's a best practice. So what do we do with those domain models? Well, we put all of our logic in them and then as a result of that, those domain models that started kind of nice and right-sized started to get bigger and bigger and more complex, and then somewhere along the way it got really difficult to manage the complexity of those domain models. Now, remember I asked you about who read the book? There's actually an interesting comment about the domain model pattern in the book. I'm going to quote it for you. Where should you be using a domain model and where should you not? Because as a pattern, not a best practice, it has a context. It has a place where it is supposed to be used and places where it's not supposed to be used.

10:42 Udi Dahan

So here is a literal quote from the book where it says, "If you have some simple NOT NULL checks and a couple of SUMs to calculate, a transaction script is a better bet." In other words, domain model was not actually meant to handle all of the logic in the system. It's meant to handle specifically the complicated and ever-changing business rules. Now, just so that you can check me and keep me honest, that's on page 119 of Patterns of Enterprise Application Architecture. So now that we know that domain models were not meant to be used for everything and now we realize we kind of were using them for all of the logic and then that might've created some problems for us, then maybe we can start thinking of taking some steps back from there.

11:36 Udi Dahan

So that's already one of the pitfalls. It's not to directly related to CQRS, but in my experience there's a large amount of overlap between people doing CQRS and those using domain models, and a lot of the times it's the domain models kind of get big and complex, and that causes issues that sometimes people say, "Oh, that's a problem with CQRS." So no, actually it started somewhere else that was at the domain model level. Now the other thing about those domain models, as we made them bigger and bigger and bigger, we found out that using them for queries, just uncomfortable, difficult. As we started spreading things around in one-to-many this and many-to-many that and we partitioned different entities and into different places, it just strengthens the fact that, oh, yeah, we should be treating queries differently. And that's when that idea came to say, you know what? Domain models are great. We like them. However, what if we did not use the domain model for everything?

12:50 Udi Dahan

What if we started to split off, for example, the queries, that CQS thing that we were talking about before and we didn't have that go through the domain model, go through something else? And instead of having just one API for everything, which is sort of the gateway through which all clients talk to and that API is using the one domain model for everything and that's going and talking to the one database for everything, what if we split that up and essentially created two slightly different APIs? One of them for commands, one of them for queries. They're connected to each other, because ultimately, they're dealing with the same data, but they are different, they serve different purposes, but they sort of interlock together in good ways. So essentially at sort of an architectural diagram level, not a big deal. It's kind of like, okay, we used to have one API box, now we've got two. Queries over here, commands over there, standard CQRS. Great. What's not to like?

13:54 Udi Dahan

So if we just stop there as an industry, then maybe this talk would be a bit shorter, but like with many things, once you have something of a good thing, you try to extend it and abstract it and use it for more things. So beyond the standard CQRS approach, we took it one step further and said, "It's kind hard to do good queries and good commands and good logic on either side of those when both of those are talking to the same database, because the database kind of serves as a constraint. What's the database designed for? Is it designed for the commands? Is it designed for the queries? How do we optimize that?" So the idea came along and said, well, what if we took this Command Query Separation and not just stop at the API level, but go all the way through to the database level? Let's give each API its own database and then we'll optimize the query data model for queries to be able to handle those really quickly.

14:57 Udi Dahan

And that's essentially where the whole idea of NoSQL started coming along to say instead of having to join 17 tables in order to answer a query, what if we created documents with a little bit of duplicated data between them, but then our queries would be a lot faster and simpler? And then once we had that out of the way, well, we could simplify the command side as well, because we wouldn't need to have a command database that had to address all sort of really complicated queries. So sounded even better than before. Of course, we'd need some kind of mechanism to keep the command database where we're updating all of the data constantly to be in sync with the query database.

15:40 Udi Dahan

So now I call that to distinguish it from the CQRS that we were talking about before, calling this Eventually Consistent CQRS, because we've got the command API going and talking to its command database and then we've got the query API going and talking to its own query database, and then we need to have some mechanism and it's usually going to be asynchronous. There's that word again that Sam Newman told you about, right? Say, asynchronous meaning sometime after we update the command database, we go and update the query database, and there'll be some kind of component that is responsible for doing that. Here I'm just using a generic query updater type of term. So we're issuing events from our command API after we have successfully updated our command database, and then that gets reflected up in the query database.

16:35 Udi Dahan

So as you know, double the database, double the fun, minions are happy, an even better pattern. But now when we're talking about CQRS for the rest of this talk, I'm going to try to be specific and talk about standard CQRS, meaning single database CQRS versus Eventually Consistent CQRS where we've got multiple databases. All right, so there are two different styles to those things. Now at the beginning of this talk I asked you to raise your hand and say, "Who's doing CQRS?" Now what I want to do is I want to get a little bit more specific. Who's doing the standard one database style of CQRS? Can I get some hands? Yes. Okay. And who's doing the Eventually Consistent two database style of it? Okay, so it's about a quarter are doing the Eventually Consistent style and I've got three quarters that are doing the standard style.

17:25 Udi Dahan

Now, great so far. It's important to recognize that a lot of the patterns that we've got today, we didn't invent them, they were around before and there are other ways of doing those as well. So my guess is that a bunch of you have heard about this thing called database replication before, right? It's a fairly old technology. Pretty much all of the database vendors out there support it, both on-prem and in the cloud. General idea is that there is some sort of primary node where all of the rights are going to. That primary node then replicates those changes to read replicas that essentially have a copy of the data, but those replicas are there only for query type of terms. So essentially, database replication, this idea of primary replica replication, already has a kind of CQRS at play at the database level. So in some cases, even before you start thinking, "Let's go write a bunch of code," maybe look at what the database has to offer and say, "Maybe that solves the problem for me already and I don't have to do any additional separation over the top."

18:47 Udi Dahan

Just as a quick note, this replication style is essentially the same data model. So it's not like the Eventually Consistent style where not only did we have two different databases, they were actually two different database schema. They were structured differently from each other. When we're talking about database level replication, it's the same schema in all of the places, but the data is copied and replicated between them. So it's a variant of what we were talking about before. Logically, it's standard CQRS territory. From a physical deployment perspective, it's a lot more like the Eventually Consistent CQRS model. All right? So that's another style. Pretty much all of the database have this and it solves a lot of scalability issues.

19:39 Udi Dahan

It never ceases to amaze me essentially, since ORMs took over the world that developers really stopped learning about database capabilities. We treat it almost kind like a dumb file system where they're actually application servers in their own right and they have lots of capabilities, it's worth getting to know them. Other things that they can do include sharding. Sharding is where the data is actually split apart logically between different database instances, and then it gives you a scaling out also on your write side as well and not just on the read side. Anyway. It could be a whole talk just talking about database functionality and features and replication and sharding and those things. It's worth taking some time reading up on it for whatever database technology you're using. It will serve you well. It's another tool in the toolbox. It's essentially another kind of CQRS available to you.

20:38 Udi Dahan

So how else does CQRS go wrong? Essentially by applying it as a best practice. So one of the places, the most common places, is in the context of simple CRUD, when doing create, read, update, delete, and we have some fairly simple straightforward data, a couple of strings, a couple of IDs, maybe a date or something like that, and you just need to get that from the UI into database and back out again. All right, so create, read, update, delete. I mean, you could do the standard CQRS type of thing and put the query off on the one side and then the commands on the other side, but it's kind of unnecessary. But sometimes people, because they're like, "Well, we're using CQRS for everything else," will also use it for the simple stuff. So it's not terrible, but it sort of compounds up and makes things just more complex over time.

21:36 Udi Dahan

Every once in a while we complain about having big balls of mud, but every single time we take something simple and make it more complex, we don't realize that's how the big ball of mud came into being. It's just one extra line of code at a time. So if you have this type of relatively simple, straightforward CRUD type of solution, there's this great technology out there, you might've heard about it, called Ruby on Rails. Two-tier active record style. There's no actual domain model over there. Rails was created to be the fastest, most developer-friendly way to do forms over data. So for those of you that are purely on the Microsoft platform, think MS Access. Microsoft Access was a great database, very simple, fast way to go and build a UI that had data persisted and shown back again. If that's all that you need, then absolutely use that.

22:37 Udi Dahan

Now, when you get into sort of the CRUD territory, and for those of you that are saying, "We're not going to do standard CQRS, we're going to do Eventually Consistent CQRS," that's where things actually start going quite badly. The problem being that as a user, when I go insert a record into a database, I expect when I click to the query screen, that that record will be there. It's kind of an obvious expectation when you say it that way, right? However, because the query updater might be working asynchronously, the data might not appear in the query database just then and then the user is kind of pressing F5, F5, "My data's not there." What does the user do in that case? They go and insert the data again. They think the system lost it, then they type it in again and then they click submit and then they go back to the query screen, press refresh and like, "Oh, yes. Now it's there, everything's great." Five minutes later, 10 minutes later, they're doing some other work. They come back to the query screen. What do they see?

23:53 Audience

Two copies.

23:55 Udi Dahan

Two entries, right? Saying, "Stupid system. How did it duplicate my data? Why did it duplicate my data?" Of course, the user duplicated the data, but we as engineers created an environment where users would do that, and then that starts to create data quality issues of saying, well, now you've got two IDs that the user might be connecting different things to those different identifiers and then just sort of bad things start to accumulate. It kind of goes from bad to worse the longer those duplicates stick around there.

24:33 Udi Dahan

So I don't want to say that Eventually Consistent CQRS is all bad, but in the context of single user type of CRUD, that is where there's a lot of these types of issues. We don't tend to catch them when we're building and debugging the system, because when we're working on it, we're pressing F5, debugging, everything is running live. We type something in. The system has zero load, there is no lag between the write side and the read side, so we always see that the queries are fresh, but when the system is under the most load, that's when it becomes the most delayed and that's when the users really don't like it.

25:16 Udi Dahan

So in your case, imagine when you're a user, you go add something to your shopping basket in Amazon and you're like, "Oh, okay. Great. Let's go check out," and then you go to check out and on the check-out screen, your basket is empty. Kind of like that. That's not supposed to happen that way. Now again, these types of problems only manifest once the system is under load. So again, the whole Eventually Consistent CQRS has its place, but be aware of the fact that dynamic of what it's going to look like with the single user reading their own writes.

25:53 Udi Dahan

Now this leads us to our next pattern or our next context for CQRS, and that's the difference between private data and public data. Now, most of the time as developers, we just look at data as data. We got data in a database and users save it, update it, connect it to other things, but we don't always think of it enough sort of from the business perspective, and it's helpful to use this framing and also to understand how users think about the data in the system. A lot of times, especially some of the older users, we've been trained to save data so that it doesn't get lost. Some of us, we already have that CTRL-S habit. Doing some work, CTRL-S, doing some more work, CTRL-S. Why? Just in case, because we've had a problem and we've lost hours and hours of work. So when we save data, the context is I don't want to lose this. The context is not please tell everybody about this change that I made right now.

27:14 Udi Dahan

Now, the way that we often design systems, there is no difference. Once the user clicks save, it's in the database table. Other users who query that table can now see that data and they can start doing stuff with it. So as a first step in terms of applying this idea of private versus public data, try to introduce that distinguishing characteristic already in the UI. Think of your system as having a content management system element to it. So for those of you that have blogged or Facebook or Twitter or whatever, we've got a publish step where it's explicitly publish this to the whole internet versus save as draft. So for those of you using GitHub as well, there's save as a draft. You got your own little workspace. We're used to having that now as users. When we design systems, that's also something that we should be introducing for them.

28:22 Udi Dahan

So we've got the private data, you can save that. Think of that as saving that as a draft, and then at some point the user can click the publish button and then we'll say, "Are you sure this is going to make the data public and all the other users can use it? It will appear on the such-and-such page." If the user says, "Okay." Great. Now the data goes into the public domain. Essentially we can look at CQRS, command query responsibility segregation, and apply that same kind of idea to private versus public data that we've got a private data type of API that is appropriate for CRUD type of work. That could be very simple, straightforward, Ruby on Rails type of thing that is talking to a database, versus the public API, which is showing data that has already been designated to be public. Essentially inside of our database, we've got some flag that is the, Is Public flag to indicate should this be made public to other people or not.

29:33 Udi Dahan

So essentially it looks pretty much the same as standard CQRS, it just a different semantic. Instead of dividing commands versus queries, we're dividing private data versus public data. Now we can take the whole CQRS thing, the Eventually Consistent style, and also apply that in the context of private and public data where we could say there is private data and we're going to put that in the private data database because we're kind of afraid that some developer will make some mistake, not set the flag or the flag will get changed by accident. So we create some deeper separation between the private data and the public data, and then when the user clicks the explicit, I would like to publish this, then we publish an event and we carry that data over explicitly to the public data database. So essentially, it looks pretty much just like the Eventually Consistent CQRS story, the difference is really more the semantics, what we're using it for.

30:44 Udi Dahan

Now when doing this, this gives us a whole bunch of benefits on the user side. It solves the problem of essentially using CQRS where we don't need it. So in the case of private data where, again, we're just doing simple CRUD, we don't need additional CQRS there, and on the public side, well, we're going to get more into how to actually design the public side because a lot more tricky things down in there. But before I do that, I want to come back to one of the problems of the eventually consistent CQRS style. So for those of you that aren't using that, that's fine. Just be aware, kind of smile to yourself with your prescience of knowing not to use this style. For those of you that are using it, hopefully this will make you aware of it and able to solve those problems.

31:41 Udi Dahan

So one of the advantages that we mentioned before of the eventually consistent CQRS style is that we can have optimized data models. The write side is optimized for doing writes very quickly. The query side is optimized for doing queries very quickly, and then we've got this query updater that is keeping them in sync. Where's the problem? Well, it only hits when your system is under very high load. So when your system is under average load, that query updater component is able to keep up. It's getting these events, saying, "This changed. I'm going to go update the query database. I get another event. I'm going to go update the query database." Now, remember that the rate at which this needs to happen is essentially the rate at which commands are coming into the command side. So if you've got a thousand commands a second, that means that we need to be processing a thousand events per second in our query updater component.

32:44 Udi Dahan

Now remember, the command side is optimized for processing commands quickly. It's optimized for doing those updates. So it can do that a thousand times a second, 2000 times a second. The query database was never optimized for updating. We designed it so it'd be really good to read from it knowing that we're making a trade-off that it will be harder to update it. Essentially, it requires more processing power to update the query side than the command side, but we need to update it at the same rate on average. It can sort of peak and flow and whatever, but if the rate starts to get too high, then the query side can start falling farther and farther and farther behind, and that's where we usually get into trouble. Users start looking at the system, they're saying, "Well, I'm looking at the query screen, but it's showing me data from 20 minutes ago. It essentially becomes unusable."

33:48 Udi Dahan

And this is a common problem with systems and also for us as developers is that we design systems, we use patterns in order to make them scale, but when they are scaled, they start failing in these sorts of weird ways, and then we're not exactly sure what to do, and that's usually the worst time. So we'll talk more about that in a little bit, but I just want to connect the dots with some other talk that you might've seen. Jeremy Miller gave the talk about the vertical slice architecture. Who saw Jeremy's talk, yes? Vertical slice architecture. Oh, not that many of you. More of you should. Vertical slice architecture, if you haven't heard about it, read about it, please go take a look at it.

34:27 Udi Dahan

Essentially, the CQRS thing or the private-public responsibility segregation is a kind of vertical slice architecture where you've got a certain slice that goes from top to bottom, you can think of it, oh, that's kind of layered or that's kind of layered over there. Oh, we even have a little bit of event-driven architecture down the middle of that. So it can have a bit of both in those things, but that's the vertical slice architecture. If you haven't seen it, go look into it. We'll touch on it also a little bit more later.

34:58 Udi Dahan

Let's delve deeper into the public side of things. Once the data is public, how do we handle it? So the important thing to realize is once that data is public, multiple users can start using and operating on that data in parallel, which is why, as we mentioned before, it's important to make that explicit so that users know now it's public and lots of other things can happen. Now, in some cases, there are some systems that are designed with this intentionally. So it's like, okay, we're going to build a collaborative type of experience. This could be a Miro board if you're familiar with that, or Google Docs or all those types of places where it's designed inherently around collaboration, and that's great when there are those kinds of systems. But the more common case around public data is that it's kind of more of an accidental bumping into each other. Oops, sorry, I didn't realize you were working on that data while I was working on it.

35:55 Udi Dahan

Now, sometimes for those of you, I don't know if you've used Dropbox or some other type of collaboration technique, then it'll put up this little badge, say, "Oh, there's another user that's working on this document as the same time as you are." When we're talking about our systems, if there are other users that are fiddling with data, it's more rare that we actually make that visible to our users. So users don't even realize when they're stepping on each other's toes. Now, this is probably sounding a little bit abstract, so let's start making this a little bit clearer and more down to earth.

36:28 Udi Dahan

So here's how it plays out. We got one user looking at a piece of data. Other user, for whatever reason, goes and looks at the same data at the same time. None of the users realize that any of this is happening. One of them goes and changes the data that they're looking at. It succeeds. The other user comes along and goes to change the data as well. The thing is that the data that they're looking at on their screen before they make the change is essentially stale, right? They don't know what actually happened that was done by the other user.

37:00 Udi Dahan

Now, some of you might've put in some Azure web pub/sub SignalR thing such that any time any change happens, all of the users that are looking at it have it updated automatically, and that's lots of fun to build those types of things until you're a user and then you're on a screen and it's a, "Oh, this data has changed." You're like, "Okay, refresh," and it's like, "Yeah. Okay, that's irrelevant to the thing that I'm trying to do." So you're going to continue to do some more work and you get another notification, "Oh, data's changed again," and at some point you just start to disregard it and then you go make the changes anyway, and then again, you're stepping on other people's toes.

37:45 Udi Dahan

Now, in the technical terminology we call the stepping on each other's toes, race conditions. For those people who are race car drivers, when they step on each other's toes, it can really be quite tragic. This is not just a race condition, this is the car flies into the air and bursts into a thousand pieces, and we hope that the driver is okay. So technically when we're talking about race conditions, users stepping on each other's toes, think about the data in your database flying up and fragmenting into a thousand little pieces. You should be worried at that level of worry about race conditions when you're talking about public data, because it's public, because there are lots of users that could be operating on it at the same time.

38:33 Udi Dahan

Now, most developers blissfully ignorant of this, right? It's a, I just write the code unit test pass, I check it in, next day I come to the morning stand up, say, "I finished my ticket. All done today. I'm working on another ticket. You aren't going to need it, right? None of that big architecture up front. We're agile here, delivering features to the client the way that they like it." I'm not going to have an agile talk, but I got to tell there's a certain amount of willful ignorance on these other concerns that we don't have a ticket for it. Management doesn't allow us to work on things that don't exist in the ticket repository, so it's not our fault.

39:20 Udi Dahan

Anyway, let's go take a concrete example of how this happens, how this plays out in a very simple small set of public data. Inventory. So let's say we have the eCommerce type of domain that's been done to death, I know, I'm sorry, where we've got some new product that's come out, the next Harry Potter book, and of course all of the Harry Potter people are just kind of losing their minds, "Oh my God, I got to buy this thing right away." Now, the developers that designed this said, "Right, we're going to have an inventory table." I'm including the name here just for context. We don't actually need the name of the book in there, and we've got a row for every single item with a quantity column in there that says how many of that book do we have in our warehouse. Simple, right? And when a user goes and buys a book, what do we do? Well, we need to go and update the inventory table.

40:15 Udi Dahan

How do we do that? Well, even before we get into object relational mappers or any of that fancy stuff, we've got SQL. So ultimately, what ends up going to the database, ORM or not, is SQL, a kind of transaction where first step is, well, we need to see how much quantity of the thing is currently in the database. So we do a select from the inventory table for that product ID, and then say, "Oh, okay, there's a thousand items for that ID." We go and then check that there is actually sufficient quantity in there. It could be that this person wanted to order 10,000 Harry Potter books, don't know why. Wanted to, I don't know, wallpaper their house with Harry Potter type of books. So we check in our transaction to see that we've got enough, and if we do, which we probably should, then we go and update the inventory table, setting the quantity to be the A minus B, what we had before minus what the person ordered for that product ID, and then we commit the transaction.

41:20 Udi Dahan

Very simple two-step process, query and then update at the database level. No eventual consistency, right? This is in the database. Think it, I've got to store procedure even just to keep simple. Now test this, works fine. Great. Ship it. Move on. What happens when multiple people go and order the same book at the same time? And this tends to happen when there's a very popular new book that comes out. So let's say we got about 200 of you over here. All 200 of you are going to my database, I'm the database, and issuing that SQL statement, that set of SQL statements. Now as a database, I'm really smart. I try to paralyze things. So all 200 of you come to me and say, "Udi, what's the quantity of the Harry Potter book?" And I respond to all of you saying a thousand, and you're like, "Awesome, great. I'd like to buy one book, please update the inventory table. Set the quantity of the Harry Potter book to 999."

42:34 Udi Dahan

So I get that command from you and I update the record to say 999, and I get the next statement from you and I update it to 999 because the queries happen at the same time. The commands are now coming in at the same time, and every single one of you issues exactly the same update command. I got 200 people that are buying books a thousand as the original value, and after all 200 SQL statements have completed, the end result is I've got 999 items in my inventory table. However, unfortunately the folks in the warehouse are kind of looking around saying, "No, it's not that many. We actually ran out of inventory. You thought you had a thousand, but that was because the last time 200 people came along, you only decremented it by one, and then the time before that, that 500 people came along and bought Harry Potter, you also decremented it by one. We're all out."

43:40 Udi Dahan

Now, of course, you won't find out about this in testing because it's rare that we actually try to do this parallelized testing to validate correctness. Now, why does this happen? It's this thing called optimistic concurrency that databases do by default unless you tell them not to. So essentially databases say, "I'm going to try to be a good member of your architecture and not block all of the transactions," because some of you might be buying Harry Potter book one, some of you might be buying 50 Shades of Grey, somebody else is buying Patterns of Enterprise Application Architecture for some reason. So we don't want to block all of those transactions. Those can and should run in parallel, but at the level of a single record, that's where we don't actually want the optimistic concurrency.

44:33 Udi Dahan

So even though optimism is a great personal quality, it's nice to have optimistic people around us in life, optimistic concurrency, not such a great choice for public data, especially not popular public data. Again, whether it's your marketing department is running some campaign and saying, "Hey, everybody. We've discounted a very popular item by 50%, come and buy it," or somehow we got featured on Oprah and now all of the world is going and buying our stuff. Be aware of this type of thing. Optimistic concurrency can actually end up creating an eventual inconsistency for your system.

45:15 Udi Dahan

Now as developers we're like, "Hey, no big deal. Just switch it, add some locking, no problem, right?" Optimistic concurrency, bad. Fine, we'll do pessimistic concurrency. So what happens when we've got pessimistic concurrency? Well, for the most part, under regular conditions, the data is now more correct. However, on Black Friday, the busiest day of the year, where we make 80% of our profit, this is what happens. And the stories just happen almost every year. Every Black Friday, some major retailer has a major outage of their site, and you hear it on the news, "Oh, they're trying to get the site back up." As technical people are like, "You do realize it's pushing a button, right? You don't actually need to go to the data center and pick up a rack of metal to get it back online. What's so hard about getting this thing back online?" Well, part of it has to do with the technical dynamics of what happens at a database level when you introduce the pessimistic concurrency.

46:28 Udi Dahan

The culprit is this thing called the connection pool. The connection pool dries up. Why does that happen? Well, so we've got this inventory table and we've got all of you users that are coming along and wanting to buy books. Each of you opens up a connection to the database and I'm like, "Hello, welcome. Yes, here's a connection, here's a connection, here's a connection. Everybody gets a connection," under the premise that you all will be very good citizens and complete your transactions quickly. But what happens when all of you want to go and buy Harry Potter? Well, you come in and you go buy Harry Potter first, and I'm like, "Oh, okay. Yes, welcome. Come in. I'm going to lock that record for you." At the same time, you're like, "I'd like to buy Harry Potter." I'm like, "Oh. No, sorry. Pessimistic concurrency. Please wait."

47:15 Udi Dahan

While you're waiting, say, "Hey, what can I do for?" You're like, "I want Harry Potter." I'm like, "Oh. Get in line after him. Please wait. What about you?" "Hey, Harry Potter." "Back of the line. Please wait." And I go through all of you and what's happening? Essentially, the connections are left open, but they're not doing anything. The queue is building up, and essentially I'm processing them one at a time. Now, things get really bad when I ask you and I say, "What would you like?" And you say, "I'd like Patterns of Enterprise Application Architecture." Be like, "Oh, wonderful. Thank God." Said, "Okay, I'm going to lock that record," "And Harry Potter." "Goddammit. Well, your transaction can't complete now. You're going to have to wait. What would you like?" "Domain-Driven design." "Oh. Yes, God. Thank you. Wonderful. Yes. Okay, lock that one for you." "And Harry Potter." "Okay, another transaction that's waiting."

48:15 Udi Dahan

But now, Patterns of Enterprise Application Architecture is now locked, waiting on the queue of Harry Potter and Domain-Driven Design is locked, waiting on the queue for Harry Potter and all the other users that wanted to do something, eventually, they come to... they say, "Hi, database. Can I have a connection?" And I'm like, "I'm flat out. Just all these people, they're just here for Harry Potter and I can't get rid of them fast enough." And essentially at that point saying, the site is down. That means the site is unable to get connections to the database, and the reason is we've got all of these locks in there that are piling up, holding up the transactions.

49:02 Udi Dahan

A, When designing a system, you really should test for these scenarios. When these problems occur, this is the worst-case scenario, all hands on deck, technical people somehow please fix this, and there isn't anything. It's like we can't substantially change the code and the data model in the middle of Black Friday. So it's like, oh, I don't know. Try to put the database on a bigger server. It's like, oh, okay, great. Now instead of supporting a thousand concurrent connections, we have 5,000 concurrent connections, which now means we can have five times as many people waiting for Harry Potter as we did before, while everybody else continues to be blocked. Now, really the only solution for these things is essentially to change the business model, coming to the business upfront saying, "We're going to need to make this more flexible, have this work differently, because technically we can't get out of this problem. We've designed ourselves into a corner, and the only way out is essentially going all the way back to where we started."

50:15 Udi Dahan

Now, what this means in the inventory domain is that we need to be able to accept a temporary negative inventory type of situation. We need to be able to allow the inventory to go negative and then have some kind of logic that asynchronously saying, "Oh, turns out we can't fulfill your order. We'll send you an email about that afterwards," say, "Oh. Sorry, our bad. If you want, you can cancel your order." But then we start the back ordering process, we contact our supplier and we get them to fulfill that thing. And then there's some delays, real-world delays, but the business flexes and handles it and everything's okay. Now, that's great for some domains. In others, when we say, "Let's go into do this negative inventory." When it comes to Taylor Swift tickets, there just aren't. It's like when they're out, they're out.

51:08 Udi Dahan

And this recently happened to Ticketmaster where, again, Taylor Swift comes along, big event, everybody wants tickets. There's no getting more tickets when they're out, right? That's it. That's the limited inventory. And essentially what happened is Ticketmaster crashed and kept crashing over and over again, and then they had congressional hearings in the US as to why people couldn't get Taylor Swift tickets. Of all of the things government should deal with on the list of priorities, we've got poverty, we've got education, we've got crime, we've got climate change. What do we hold congressional hearings about? Taylor Swift tickets. We know where our priorities lie.

51:52 Udi Dahan

So how do we make this thing more flexible? Well, as a general principle or approach already at the business level, we need to be thinking of how do we make the business process as a whole non-blocking? It's not just a technical change, it's a whole rethinking of the business process. So again, from Sam Newman's talk, the whole asynchronous type thing, what we're looking for is non-blocking, not necessarily asynchronous, where again, the idea of non-blocking, say, "How do we make this as smooth as possible? Change the UI, change the business logic so that there aren't those choke points that stop the user from succeeding in what it is that they want to do."

52:37 Udi Dahan

One way of potentially doing this is instead of using the eCommerce style of handling it, we could charge the customers up front. Don't even let them select their tickets or where they're sitting first, first thing, "Oh, you want Taylor Swift tickets? Give me your credit card." Anybody who says, "No." Ha, that's okay. There are plenty others. And they'll pay anything. So I know I've got whatever it is, 50,000 Taylor Swift tickets. Give me the credit card. I'll charge you however much I decide, I'll tell you, and then I'll have the system assign you a seat automatically. Now, some of you might be saying, "I want to select my own seat." I'm like, "Good for you. I don't care. I'm going to sell out the concert one way or another. This is the non-blocking, most technically efficient way for me to process 50,000 screaming Taylor Swift fans. I do not want 50,000 screaming Taylor Swift fans locked in my database, taking it down. I want them in and through and out as quickly as possible."

53:49 Udi Dahan

Coincidentally, this is how the London Olympics did it. When people wanted tickets for the London Olympics, there wasn't this, wait a minute, how do I select my seat? It's like, just be happy you got a ticket. It's a different domain. It's not the same as eCommerce inventory. And there were no complaints. And again, this was 2012. This was a decade ago, and it worked. And since then, the people that worked on it, like, "Yeah, that's how you do it," and then everybody else is like, "Oh. No, but we want to do it with gRPC and NoSQL database." You're asking the wrong questions. This is a business question. This is a UI business rules type of flow type of scenario. Now, there's another scenario that's going on that tends to happen behind the scenes. Some businesses are aware of it and they try to ignore it. Some of them embrace it.

54:47 Udi Dahan

Essentially, when there's limited inventory and you can't get any more, essentially what's going to happen is an auction tickets are going to go to the highest bidder. So what tends to happen with the most popular concerts on Ticketmaster is the bots come in, buy up as much inventory as they can, as quickly as they can, and then they turn around and they hold auctions to sell those tickets to the highest bidder, and of course, Ticketmaster and Taylor Swift are very upset about that, which is a problem, which is why we had congressional hearings. But essentially, it's recognizing the fact, saying, if this is going to happen, if this is the nature of the domain, then embrace it, model it, integrate it into your system.

55:33 Udi Dahan

And you don't necessarily have to pick. You can have different vertical slices for your private data for when things are being set up by back office staff. Then we've got certain different vertical slice for the back-ordered inventory type of domain where we can get more inventory where it runs out. We can have the public pre-charge type of model for another set of data or the public auction style. We don't necessarily need to have a single business model, a single domain model for everything that we do.

56:05 Udi Dahan

We could take things a step further and do advanced composition in the UI that's essentially stitching data from multiple verticals together where, say, the image of the product and the name of the product and the kind of things which are not changing in fairly static type of data, we put that into one database inside one vertical slice and other things related to the ratings, they're going from another vertical slice, and the price is coming from another vertical slice, and the inventory is coming from another vertical slice, and essentially we stitch all of these things together on the UI together. If you want to see a technique of how to do this, there's that URL for you, and it'll take you to GitHub repository. You'll be able to see this kind of style of doing a composite UI. It won't be as visually appealing as this Amazon type of thing, but it will give you an idea of how to do these more advanced composition techniques.

57:02 Udi Dahan

Now, I know some of you're going to say, "Right, but why is it that I need to create all of these different UIs and business logic types of things, different databases? Can't we have some generic platform that does all of those things in sort of a standard way? Because our business stakeholders, they don't like answering these questions. They're like, 'We hired you to fix these problems for us, so fix it. Make the system scale, solve all of those problems.'"

57:37 Udi Dahan

Now, you can try. I've seen so many companies try, but it just doesn't scale. Every single time, the problem being, it's not the hardware. When people say, "How do we make it scale? Oh, we'll throw more hardware at it." You can't do that with the database so much. You can scale up a database, it's really expensive to do so, but then you just sort of buy yourself a little bit more time. You're like, "Okay, so we got a $10 million database server in order to be able to support 100,000 concurrent connections to the database. Maybe that'll hold us till next Wednesday." Beyond that, it's kind of like, well, what are we going to do when the next big thing comes along? It's like, "Find another company. We're developers, right? We're not stuck. We can go anywhere we want."

58:32 Udi Dahan

Anyway. I know I've tossed a lot of stuff at you, a lot of pitfalls, a lot of patterns. Luckily, this talk is being recorded. It'll be available online, you can follow up afterwards. And if you want to go into more in depth, I'll be down at the Particular Software booth. Come chat with me, talk about your CQRS questions and concerns. I'll be happy to address them. And for those of you that prefer to continue on your own time, there's a free CQRS course that I've got online. It's available at this link, go.particular.net/ndc-oslo-23-udi. U-D-I. So hopefully those resources will give you a lot to continue on. It was so wonderful being here and talking to you all about CQRS. My name's Udi Dahan, @udidahan online. Thank you so much. Have a great rest of your conference.

CQRS pitfalls and patterns

About this video

🔗Transcription