Skip to main content

Embracing NServiceBus - Best practices

See how NServiceBus and rigorous architectural measures helped turn a slow and unmanageable SaaS application into a better system.

🔗Transcription

00:02 Roy Cornelissen
Yeah, we're set.
00:07 Mark Taling
Good to go.
00:08 Roy Cornelissen
Okay. Welcome. Enjoying the day so far? Great discussions, right? Welcome to our talk. Again, a talk about using a NServiceBus in practice. We call it Embracing NServiceBus, and we want to take you through some best practices and lessons learned that we got from our project, and a couple of maybe different insights than what we've seen today. We hope to fill you in with some nice details here. What this project is about, is we're building a soft solution for workforce management, so planning personnel and retail stores, and quite a complex solution. It's been a journey the last couple of years, to bring a, basically an on premise HD model application towards the cloud. NServiceBus has been playing a role with that. My name is Roy Cornelissen. I work as a software architect for Info Support, which is a consulting company, and I've been one of the version 1.X guys using NServiceBus since, I think it was 1.8, 1.9, and been with it ever since, and also using this in production now with our current system, because of the great match. I brought with me, my colleague, Mark.
01:43 Mark Taling
I'm Mark Taling. I'm a lead developer at the same company, and I hooked on NServiceBus around 2.6, I think. Well, I've been using it ever since.
01:55 Roy Cornelissen
We work together on this, this project. What I wanted to do is first give you a brief introduction of where we started. Some things might sound familiar, from what we've seen from a couple of guys. I think we all come from the same kind of situation. Then what happened and what NServiceBus enabled us to do, and then take you thorough a couple of topics that are relevant for us, and that we needed to find solutions for in our system, and the things that we learned from that, things like multi tenancies, working with Sagas, and dealing with the everlasting evolution and development of NServiceBus, things like that. These topics, we'll handle.
02:47 Roy Cornelissen
First of all, where we started. With the previous talk, it was mentioned a big database. We had that as well. A traditional layered architecture here, a service layer, web service layer, so it was a web services oriented architecture here, with a couple of front ends there, servicing different types of audiences, different types of users, and also an integration layer which also contained a bit of a legacy in house builds, custom builds, hosting processes, and very not stable, very unstable. Inside the service layer, the integration layer, servicing external applications, because we need to import data and export data, and then the front end's servicing our customer users.
03:40 Roy Cornelissen
If you look inside this service layer, we had a bunch of modules there, and deliberately made these shapes a little bit not straight and a little bit amorphous, because it wasn't clearly defined what these modules, what the boundaries of these modules were. They had a bit of an overlap. All of these modules would talk to that same database. Maybe very recognizable, the fact that we had web services there, would end up ... We would end up with a code pass that would literally look like this. For example, if someone changed the contract for an employee, then that had an effect on plannings that were already made, and maybe realizations that were already made, and also payments that were outstanding for their salary. We had a whole thread through all of these modules, and even modules calling each other, before it returned back to the user. Not very scalable, and especially when we were at a point where we were hosting this system at a customer site but we were taking this to the cloud. We were expanding our customer base, going multi tenants, and we needed to do something about that.
04:55 Roy Cornelissen
What NServiceBus allowed us to do, and I don't think I have to go into why NServiceBus does this, but what NServiceBus allowed us to do is take chunks of these modules and really separate them out into more clearly defined services, clearly defined components, as we call them. Inside our system, we're not really doing an SOA type of architecture, but you could compare it to a microservices type of approach. Still a big database. That's something that we still need to take care of, separating those islands of data out, but it allowed us to thin out the service layer and put that into background processes, and really made the system come alive with events and messages.
05:46 Roy Cornelissen
We're right in the middle of transforming our system, and like Charlie said this morning, we're not taking the rewrite everything approach. We're really taking the evolutionary approach here. It takes more time, but it really helps us keep the shop open and keep our users using this system, so that's nice. We have the service bus in between. What it also allowed us to do is to take out that legacy integration layer, and replace that with a standardized web services layer that we could expose to the outside. What you also see is that if you look at the service at this integration layer, you also have clearly defined responsibilities that you could map to the same domains that these functional blocks have. We have a much better organized code base, also now in the integration space. Having this messaging system also allowed us to open a system up for monitoring, and see what the system was doing. Mark will go into that a little bit more.
06:57 Roy Cornelissen
Like I said, we are doing a soft solution and coming from a world where we deploy the system for every client on site, through the cloud. It means that all of a sudden you have to deal with multi tenancy. This web services layer that you saw, we're still deploying that for each customer separately, but it becomes a bit of a nightmare if you grow and expand and you get more customers, so you need to deal with deployment issues and keeping all of that running. What we decided to do was take these NServiceBus services and make them multi tenant aware.
07:31 Roy Cornelissen
One thing that you need to take care of then, is that if you have one single instance of a service, you need to be aware of the tenant of the context that is running on there. If a message arrives and you have different databases for each customer, because we separate the data out for our customers because of backup and restore reasons and security, one message needs to go to customer green and the other one might need to go to customer red. You need to have some sort of a context of everything that a handler does, knows where it has to go. You don't want to put in a tenant ID or something in every message that you send, and have your code littered with switch cases or checks that check which tenant is it, which connections strength should I take. You want that as an ambient piece of data in every message in every context that you have.
08:28 Roy Cornelissen
We solved that with using message chatters that we pass around on each message. If you look at the infrastructure of NServiceBus, the whole plumbing, so to say, around your message handlers, you can nicely hook into that with a mutator. We use a message mutator on every endpoint that we have, that looks at the header, the tenant ID, and puts that onto a context that the handler will run under. The handler can always rely on the fact that this tenant ID is there in the environment, and it can use it to determine which business rules to run, and also which database to target.
09:09 Roy Cornelissen
On the outgoing side, of course you also want to have this header flow with every message that goes out, because every subsequent action that happens also needs to run under that tenant ID. There's an outgoing message mutator as well, that will pick up the tenant ID and then put it on a header, and it all flows automatically. The applicability of NServiceBus helped us out a lot here, to create this ambient tenant context.
09:39 Roy Cornelissen
A quick look at how this works. We have an incoming message mutator here that basically says I'm expecting a TenantID header that's on my message. You could throw errors if the header was missing, for example, because it means possibly that it's a poisoned message. You take that header ID, and what we do here is we create a claims principal that we can also augment with maybe different claims that tell something about that tenant. The very important thing is that this tenant ID is a claim inside our principal, that flows on the current execution thread, and then is used throughout our code. There's not a tenant ID that we have to pass around in the parameter everywhere inside our logic. In outgoing, it would just go the other way. It would get the tenant ID from the claims, from the claims identity. We're using an extension method for that, and then put that on the outgoing message chatter, and then it's gone and we don't have to look at it anymore. Mark will dive into more details with tenants, multi tenancy and Sagas.
10:52 Mark Taling
Yeah, because the problem is everything Roy told you, it works. It works perfectly. Our data access layer just takes the claims from the headers, finds the right database, which is all fine. Only when you get to Sagas, you get to, in our case, a persisted state which is stored in RavenDB. NServiceBus has a single URI to connect to Raven, which means we've got a single state for every Saga. Instead of having just a unique identifier to which we can map the Sagas, we have to have the tenant ID as well, because the problem with having ... I think we've got 20 databases in production?
11:32 Roy Cornelissen
Yeah.
11:32 Mark Taling
With 20 databases all having, creating records independently of each other, it means we have guaranteed key collisions. If we have a shop in one database, it might have the same ID as a shop in another database. That makes it nearly impossible to have the Sagas stored in once place, if we map on a single ID. In order to get the tenant ID with it, we went looking for some options. We wanted to map on multiple properties. The easiest way to do this, and I think most people suggested that one day writing our blog, is concatenation. Just pour everything into a large string, and it works because the key is unique. If I combine two unique keys, it results in a new unique key. This works and we actually used this for a short amount of time, because on commands, I don't really have a problem with this. I know my command is going to be handled by a service, and if that service requires me to put two properties into a single property so I can map to my Saga, whatever.
12:34 Mark Taling
Well, quickly after that we started using events, and events to start our Sagas. This would mean if I were to use concatenated properties, that the events would already know by which service they would be handled, and if there was a Saga handling them, because I would have already added the concatenated property. Well, that wasn't the cleanest solution, so we went looking on. Luckily, NServiceBus is highly extensible and it has a IFindSaga interface. I usually call it the IFindSaga interface. I believe it's a static clause with an interface in it. Whatever. The IFindSaga interface gives you complete control over how you resolve a Saga. If you want to do it by headers, do it by headers. If you want to do it by properties, do it by properties. If you want to combine the two, like I want, do so. It was almost the perfect solution. I say almost, because when I say you've got complete control over how you find your Saga data, I don't mean complete control.
13:35 Mark Taling
If you have your data persistent in RavenDB, get it out yourself. NServiceBus will still persist it for you, but you have to get it out. Well, after some weighing our options, we decided it was worth doing so, as long as we could rely on the persistence of NServiceBus. We used the ISagaPersister, which is an interface used by, I believe the RavenDB persister and NHibernate persister, to just access the work that NServiceBus has already done for us. We added the IFindSaga implementation, but we didn't use the custom persistence, at least it seems that way.
14:21 Mark Taling
Let's take a quick look at how we did that in the code. This is a simple implementation of a new advanced Raven Saga persister. It implements the Raven Saga ... It inherits from the Raven Saga persister. If I just add this to my ... What's it? Dependency injection container, it'll work. I can add this and NServiceBus will take it as long as it's the only implementation from ISagaPersister.
14:52 Mark Taling
Well, let's see how this clause works. There are two important methods. These are not the only two, but these are the most important methods for finding Saga data. You've got the..., Yes? You've got the save method, which adds a Saga to your persistence store, and you've got the complete, which ... Well, in fact it removes it. The complete actually tells you, The Saga is completed. I don't need it anymore. Clean out my data." Let's see how that works. Of course, if I implement this using the new keyword, and NServiceBus will talk to it through its interface, it will occasionally get the old implementation, so I also implemented the interface explicitly so I know my implementation is the one that's had.
15:46 Mark Taling
How do these two methods work? Let's get a RavenDB and a document to persist. The easiest way, without any unique attributes, the document gets persisted in RavenDB. That's it. Charlie, earlier, told us already it's important to have unique keys because NServiceBus uses the unique attribute to find Sagas more easily. What it does in that case is it creates an additional document that works kind of like index. It uses the unique property, makes a key out of that, and makes that unique identifier for your key document. Then both documents are added to RavenDB, and you're done. The moment you call the complete method, both documents disappear and well, there is not trace of them.
16:44 Mark Taling
This seems fine for us. I like this approach, and slightly modifying this approach I was able to make key documents for our own keys. Our mappings had multiple properties, which just meant I had to find a unique key for those multiple property. Actually, it's kind of the same like the concatenation we talked about earlier. By catching the session factory from the advanced, from the constructor, I got access to Raven, which is highly obstructed through a couple of layers ... Some of these are NServiceBus and some of them are Raven. It doesn't really matter. There are some simple methods which you can use to just call your data or persist your data.
17:28 Mark Taling
By adding an additional store mapping keys method, which I'm not going to go through the implementation because it's just concatenation properties and hashing them if it's too long, after I save the original document, I'm essentially adding a key document to the Raven database, and since Raven does ... Well, it defers everything until you complete the transaction. They are added simultaneously and once the document is in the database, my key is in there as well. The complete is the other way around. Remove the mapping key, and then once the mapping key is gone, it removes the original document. It isn't actually necessary to do this in this order. I could also use base complete first and then remove the mapping key, since once again everything is deferred until the complete is called. It's all completely transactional.
18:18 Mark Taling
Well, this saves me the saving my Saga data. Now we just need to retrieve it. Retrieval is fairly easy. This is actually the exact same way NServiceBus does this, only instead of a unique key attribute identity, I've got a saga mapping identity. What I do is I use the mapping ID I generated to find the identity clause, or identity document from Raven. In that document I've got a saga ID property, which contains the ID of my actual saga data, and I remove ... Then I retrieve that from the database. Essentially, instead of getting one document, I'll first get my key document and then get the original document, which is exactly how the unique works.
19:03 Mark Taling
We had some trouble with backwards compatibility, because the first time I implemented this, we did it without document. We were just querying Raven, that gets some issues with still indexes and stuff. For backwards compatibility instead of adding all keys like Charlie did, we've just got to fall back. If this doesn't result in a document, then use the indexes of Raven itself to just query the document until you find the correct one. That's how we did this. I think you're going to tell something about Sagas.
19:33 Roy Cornelissen
Sagas, yes. I realize that we forgot to put in an example of how this is consumed. What you are able to do now is say, configure how to find Saga, and say met by this property and by this property and by this property. We're able to chain that together in our setup. Mark has a nice blog post about that, how that works, if you want some more details about that. Talking about Sagas. One of, for me, the most exciting features of NServiceBus. I saw a couple of people that haven't been using Sagas before, also a bunch of people that have. It's really interesting to dive into what Sagas are able to do. Without going into too much detail, I think it's important to recognize that there are a couple of patterns that you can take. Recognize the patterns in your solution, in your problem space, and then apply that using Sagas.
20:25 Roy Cornelissen
Two of the most common ones are if you look at one we call the observer, is that you have a bunch of messages coming in, in no particular order, and there's a Saga listening to all of these events and messages coming in. If it decides that it's complete, we're done, it might fire off an event and then set off another chain of actions, or another chain of things, so something we call an observer. Another one, we call controller. Actually, yesterday we had the ADSD unconference and a phrase I liked a lot was by Danny from Particular Software, we call the, sorry, the Bolshevik ... Hey, I'm missing-
21:08 Mark Taling
You've got the wrong version of the slides.
21:10 Roy Cornelissen
I think I got the wrong version. Danny called this the Bolshevik approach, meaning that there is a central controller that knows exactly how the process is going to be, and it will say, "We're going to do this, and we're going to do next, and this is what we're going to do next." Controller means that we have a message flow in a specific predefined order. It's a sequential flow, sequential process that the Saga runs through. This is a typical pattern that you see when you're doing orchestration type of solutions.
21:45 Roy Cornelissen
Jimmy Bogard, one of the champion, NServiceBus champions, has a couple of nice write-ups about these patterns. I encourage you to read that. The QR code goes to his blog. We will share the slides afterwards. He has a nice write-up of all these patterns that you can come across. When it really gets interesting is if when you start to model time into your solutions, so not just having a Saga being an aggregator or controller, but really actually using Sagas in a way that makes business functional sense, when time actually starts to become a first class citizen in your solution. For example, if you look at insurance policies for example. Each year they're up for renewal and the standard approach is to use a batch, right? Get all of the policies, renew them, and you have a batch that takes a couple of minutes or even a couple of hours.
22:43 Roy Cornelissen
What if you were able to model that on a per policy basis? If you have the lifetime of a policy, at some point in time it will get an event that is created or accepted. At that point you will say, "Well, let's set a reminder in about a year, to say if that period is over, send a message out and have that trigger a renewal process for that policy," so you won't have to do it as batch oriented and en mass, but on a per policy basis. Well, actually you just call and say, "Call me back in 300 days when it's time for, when we're up for a policy renewal." I'm not sure which of the content is on the USB key that we got this for this morning, but I really encourage you to dive into this way of building Sagas. It takes a little bit of time to bend your head around that, but this really enabled us to create very interesting scenarios. Sagas are very powerful.
23:51 Mark Taling
Well, the evolution of NServiceBus. I believe this morning we saw that each and every one of us is using NServiceBus, but some of us are using 1.9. others are already in the 4.X phase. Well, I started at 2.6 and I've still got some 2.6 versions in production. Most of my services are currently 4.03. Well, during all those versions of NServiceBus, we had some issues, some challenges, and there were some lessons learned. I want to share those lessons with you in a short notice.
24:32 Mark Taling
The first thing we learned is it is important to upgrade regularly. I started at 2.6 and development takes a bit of time, especially when you're a newcomer to NServiceBus. It is kind of intuitive, but it takes a different mindset than what you had before. By the time we were done and we had 2.6 in production, 3.0 was already out. Not a big problem, but at that point we didn't have the time to actually upgrade. We postponed it. Nothing major; we just skipped 15 releases. By the time 3.3 came out, we had time to upgrade, and we did. Small problem between 2.6 and 3.0, there were some breaking changes, and especially around the configuration there were some major changes. That's not a big problem if you've got the release notes. The only thing is the release notes of 3.3 don't mention those issues, because that's already 14 versions earlier. It was quite a challenge to just upgrade from 2.6 to 3.3.
25:45 Mark Taling
Well, we did that. It worked and then we skipped another nine versions and went to 4.0. going from 3.3 to 4.0, it was rather painless because this time we had the release notes of the major version and all big changes, all breaking changes were mentioned. We updated it and in a matter of weeks we updated to 4.01, 4.02, and to 4.03, where we are currently, because there were some bugs which affected us. Earlier versions consisted mainly of bug fixes, which didn't affect us, so we skipped those. In this case, there were actual scenarios which could affect us or did affect us, so we upgraded quite quickly.
26:38 Mark Taling
There were some issues with that. Well, they mainly came ... Let's see if we can get the next sheet on, our proof of concepts and our assumptions. We did a lot of proof of concepts of things like time modeling, but also like having services used the same queues, making send only endpoint for assistant services, or sending events from multiple services, multiple logical services, to a single endpoint, and then I mean the same event, which we tried with 2.6, and that worked. We shelved those ideas because we didn't need them yet, but it was part of our checking if NServiceBus was suitable for us, and we could use those.
27:27 Mark Taling
Recently, we decided we needed one of those IDs, because it was part of a commercial product which we wanted to launch, and guess what? It didn't work, because somewhere between 4.6 and 4.0, the already highly opinionated NServiceBus framework, became just a bit more opinionated, and decided to tell us, "You can't send same event from multiple endpoints. You need to use the same logical endpoint if you want to publish an event." It wasn't that big a deal; it just required some new architecture, some other designs, which is fine if you think about it beforehand, not if you've got one week to get it in production. What we really learned is retry our proof of concepts, retry the assumptions. I want to say we've learned from the update regularly thing, but currently we are at version 4.03, which means we are, I think, 20 versions behind from the current release. There is still some learning to do.
28:35 Mark Taling
Another thing we've learned, and it's slightly related to this evolutionary path, beware of abstractions. NServiceBus consists of a bunch of interfaces. Well, a bunch of interfaces. It's mainly interfaces and a bit of implementation. Those interfaces make it really easy to swap out different parts, MSMQ transport, Azure MQ transport, RabbitMQ transport. You can just swap them out with one line of code and it usually works. We thought so. We thought the same would be true for the MSMQ persistence, subscription persistence, and the Raven subscription persistence. I would say persistence, but I think they're named storage. MSMQ subscription storage, Raven subscription storage.
29:27 Mark Taling
The problem we had with the services, or the defense for multiple logical endpoints, we solved that by making two technical endpoints for the same logical endpoint. We've got one NServiceBus service, that we host, which is running, and you can subscribe to that, but the results are all Webnode, which is a part of the same logical service. That's kind of like the five squibbly thingies Roy drew on the board. We've got a bit of forecasting in the service, and we've got a bit of forecasting in the web service, because that legacy, but we still want to be able to integrate it in our currently evolving project. We can send events from the web services and you can do that by subscribing to the actual window service, and using Raven subscription storage. That worked fine.
30:23 Mark Taling
Well, it worked out fine in acceptance. It worked out fine in user acceptance, and then we went to production. I think someone mentioned this before today. We've got four identical servers in production, which we've only got one server in our testing environments. Using a single server with a single RavenDB instance, this worked fine. In our production server, which has four identical servers with a single RavenDB server, it meant that every event being published, due to the nature of the RavenDB subscription storage, it was handled four times. Each of the servers decided, "Yep, I'll take that message. I'll process it." Well, especially in the payment area, that gave some issues. Luckily, we discovered it quickly. I like to be paid four times, but I don't like to pay four times.
31:19 Mark Taling
Well, we thought, "Let's solve that. We'll just use MSMQ subscription storage because that's on a permission basis. That'll work." We put that in and it didn't work. The problem is that because both instances use the same interface and the same line of code to activate them, we thought, "Well just swap them out, head over to test, good luck." Luckily, it already broke in test, because using an MSMQ subscription storage, that doesn't work in a send-only endpoint, because the interface provides an Init method, which is being used by the MSMQ subscriptions storage to read out the queue, get the subscriptions, and it's done. While the Raven instance just says, "The Init method is completely empty," and it just says, "Whenever a message comes through, I'll just check live in Raven what the subscriptions are, and I'll figure it out from there." Well, the fix was relatively easy. We just got the subscriptions storage information from the container and we initialized it.
32:29 Mark Taling
That fixed it for us. This isn't a good fix. It's hopefully a temporary fix, because a huge difference in these two implementations is the MSMQ subscriptions storage only gets all subscriptions once, so when my website is already running and already has done this once, if I subscribe to the service it won't know. While the Raven implementation, it's updated live, because it goes to check in Raven every time. This is one example to be aware of abstractions. Just know how things are implemented, check a little bit deeper beyond the interface, and check the difference between each implementation.
33:15 Roy Cornelissen
Another topic. Performance in a system like this, and where we're having to deal with a bunch of performance hiccoughs, being a growing solution, adding more customers. We run into quite some things. One thing to consider is, and like Mark said, a no-brainer when you're doing messaging, is design with parallelism in mind. One of the first things that developers started doing was taking chunks of that code from that big fat web service, and just put it in a message handler, so we're asynchronous, and we don't have to bother the user with waiting times. The problem was that this code was not really designed, had some flaws in it, that allowed it to work in a parallel way, in a parallel fashion. It would start to get bugs there. The result was that these endpoints that are running this code, are now configured to run in single thread models, creating a bottleneck like that.
34:18 Roy Cornelissen
If you don't design your code, your handlers in a way that is prepared to run in parallel, you're going to run into problems like that. We got a bunch of endpoints now that are running single threaded, and we're afraid to scale that up, because of this effort that comes with it. We're also building the system further and further, so we've got a bit of legacy there, that we introduce by offloading stuff to the background.
34:50 Roy Cornelissen
Another thing that was also mentioned, also by Charlie and Mark, is the use if unique properties. What we ran into under high loads, what you get when you don't use a unique property, the Saga mapper will try to query RavenDB. If you don't use a unique property, so you don't have let's say a primary key that Raven can use, it will default back to indexes. As you might know, Raven indexes tend to be still, so there's a couple of microseconds that RavenDB takes on the background, to update its indexes. If you have a high load of messages coming in through the same Saga, you might end up with multiple instances of the same Saga document. The same Saga will exist multiple times. You can fix that by putting a unique property on all your Saga data.
35:41 Roy Cornelissen
That's something that we actually ran into when we were in production. Charlie was able to retrofit the ID documents, but we basically got multiple Saga documents, duplicates of that. One of the other results of just taking chunks of code from the web service that were quite elaborate, quite long-lived transactions, was that we found out that not only do you need to redesign or rethink your code with parallelism in mind; you also need to take care of your transactions. Of course, a bit of a no-brainer to keep your transactions small. The result of just putting large blobs of code into the background, was that if you put that in a queue, and you have a couple of these transactions waiting, the end customer is waiting for that one single transaction that is sitting somewhere in a queue. You have a multi tenancy system. Transactions from other customers are bothering your throughput, if you don't look at the scale out scenario soon enough.
36:48 Roy Cornelissen
One of the mistakes that was made that we just took one of these transactions, put it to the background, but what we discovered about the way users were using our system, was that they were actually waiting for that transaction to complete. Instead of just waiting for that single period of time for that one transaction, all of a sudden users were having to wait for a whole queue to finish. We had to take a couple of steps back there and also think about keeping your handlers as small as possible.
37:23 Mark Taling
We've got some slides prepared on maintainability, and mostly on the tooling which is required to maintain your system. Since both Udi and ... I'm bad with names. Charlie? Already showed us some tooling and some ways to do that, I'm just going to go over them quickly and check the differences.
37:44 Mark Taling
The first thing that's important to do is check how is my service doing? With the service, I mean a single NServiceBus endpoint with one or more handlers. One way to do that is monitoring production with Service Pulse. I'm not going to go into Service Pulse, since we saw it this morning, but Service Pulse isn't around for very long. I think it was released somewhere last year, which means we didn't have it at the beginning. What we did, we added custom logging to our NServiceBus handlers, and we just logged pretty much everything in a large SQL database, so we could query and check how things happened, how long things took.
38:26 Mark Taling
That wasn't quite satisfactory because it works and in testing environments it allows you to solve problems and to find issues, but in production we've got a database with millions of records, without any proper indexes, because the main thing was it had to insert quickly because we don't want to have our business process to get slow. After a couple months, it was almost un-doable to get any performance metrics or any statistics out of the database without just writing a query and coming back in a hour for the results.
39:08 Mark Taling
Luckily, there are some other systems around which can do performance measurements. We just heard about New Relic. It's lucky we use that one as well. New Relic is a web based performance metric tool, which is designed to mostly measure performance of websites and web servers. It has a bunch of plugins to check your SQL databases or your Oracle databases. I think there's even a Raven plugin, which you can't actually use if you use Raven as part of the NServiceBus license, because then you don't have the license to use it, but it is there.
39:43 Mark Taling
New Relic has a rather easy API to extend your logging. What we did, we just hooked NServiceBus up with New Relic, allowing us to create dashboards like these. Just by logging which handlers are hit and how long it took to handle those requests, we can get metrics about every time the processing time by service or even by client. We can see the throughput by service and by client. While there are a lot of other statistics, exceptions are handled, are logged, so we can see pretty much everything about our service. If one service is staying still or isn't doing anything, it's probably down. We can check that as well.
40:31 Mark Taling
All we had to do was add some lines of code to our handlers, and yes you have to add it to each handler, so we introduced a base handler for that. By adding, calling the New Relic at custom parameter, we can add our client code. Our client code is our unique tenant ID, which we named earlier, and the name of the handler. Just those two metrics allow us to make all kind of graphs grouped by handlers or by client codes. We didn't have to add the actual service name, because for us each handler only runs in a single endpoint. New Relic allows you to record response time metrics. At the start of our base handler, we do a stopwatch at start, and the end, stopwatch at end, and we can put the result in New Relic, creating the graphs we just saw.
41:26 Mark Taling
How your service is doing, that's all great. If 20 services are running perfectly, it doesn't mean my system works. I also like to know how is my system doing? That's something we didn't find a solution of ourselves for yet. Luckily, there is Service Insight, so we're always tinkering around with that. Hopefully, that will give us the technical feel of how our services are doing. Are all messages coming through? Are all connections correct? That's only technical. We wanted to go a bit further.
42:01 Mark Taling
In the business, we're currently using this. Every four weeks, we do the payments of 60 thousand Dutch employees of supermarkets. Those need to be correctly. Nearly everything you can think of, like incorrect birth dates, forgetting to add some clock times, wrong salary codes, everything can result into problems with payment. Problems in payment, you tend to get very unhappy customers. I don't like to go to the supermarket and I say, "You're working there, right? I didn't get paid because of you." We added some functional checks as well. We have an NServiceBus service running in production, which with the time modeling Roy showed us, does checks ranging from every five minutes to every hour, depending on the check, to just see how is our system doing in terms of functionality. Is all data in? Can I actually start processing, or are there some issues?
43:05 Mark Taling
Well, using Signal R, it pushes the results to front end. This screen is ... Well, a bit smaller than this, but it's running very large in the middle of one of our development rooms, and our devops team is currently watching it to see if there are any issues. Red means issues need to be resolved immediately, and orange, well they can wait but we know they're coming. That can be in a day or in 10 days. Just click them open to see more details. This is something NServiceBus allowed us to do something we didn't, we couldn't do before, because NServiceBus just allows us to have autonomously running software in production, doing these kind of checks, to just run without any user input.
43:53 Roy Cornelissen
This was a bad day, right, with all the red?
43:57 Mark Taling
I'd like to say this was a bad day. I'm ashamed to say this was an average day.
44:00 Roy Cornelissen
Okay.
44:03 Mark Taling
I just mentioned Signal R, and hopefully all of you were enticed to go look at my blog to see how we worked around that multi tenancy issue. I've got another blog post on there as well, and that's how we added Signal R as transport. We talked about MSMQ transports and Azure transports. One of the things we added just for fun was the Signal R transport. The moment I posted it online, I got four tweets from Yves telling me why you shouldn't do this, why this is a bad idea. I'm not suggesting to use this for production, but I'd like to encourage each and every one of you to just pick one part of NServiceBus, be it the transport or the persistent storage, and re-implement it yourself, just to get a bit of a feel of how NServiceBus works, and why it works. This is something I ask of every developer in our team, to just pick up a part of NServiceBus and figure out why it works the way it does. It makes it easier understand how the services work, but also how to fix issues.
45:18 Mark Taling
Don't take this too far. Recently, David Boike implemented the RFC 1149 transport. For those of you who didn't get it from the picture, that's IP over Avian Carrier, so postal pigeons.
45:33 Roy Cornelissen
It works, right?
45:34 Mark Taling
It actually works.
45:37 Roy Cornelissen
For fun, what I did was I turned the Signal R thing around, so I did a backplane using NServiceBus, a backplane in NServiceBus for Signal R for the scale out scenario. Like Mark said, it's really fun to play around with this plug-ability and find the ways that you can extend with the framework that it provides. We've seen a couple of examples how NServiceBus in a Brownfield application, in a Brownfield scenario, works. I think most of you will come from this world. I haven't seen a lot of people that actually can start out with the greenfield implementation of NServiceBus, setting up a new, cool, and Hyperdyper architecture, just having to deal with Legacy.
46:18 Roy Cornelissen
A couple of things that we learned was, it's been mentioned before, NServiceBus is a very opinionated framework, and it is so for a reason, because it stems from a lot of lessons learned from practice. If it fits your solution, it will fit like a glove. If it doesn't, it's like a shoe; it will hurt. That's also for a reason. If it starts to hurt, you will need to think about, "Okay, what am I doing here?" This is actually a good solution. Having this opinion that the framework really helps, helps you move forward.
46:51 Roy Cornelissen
It also means that if you're working in a brownfield solution, you need to be prepared to make some concessions at first, maybe introduce some pieces of infrastructure that you weren't preparing on, you weren't planning on, just to bridge the gap between your current state and moving through the whole event driven architecture model.
47:12 Roy Cornelissen
What we also learned as an organization was that messaging as a mindset, enables totally different types of use cases, more test driven. The asynchronous model needs some time to, requires some time to get your head around, and not only for software developers but also for your end users, but also for the business analysts that are designing your system, who are used to thinking in transactional request response type of ways. That takes time to sink in and really change your user interaction model. You need to take along all those people with the ideas, as an architect and also as a development team. Really, a new mindset, and it takes time.
47:59 Roy Cornelissen
Wrapping up, well we all know NServiceBus is great. It also can function as a great way to open up your really closed end Legacy architectures, and then start working from there to a message driven system. Like Mark said, it's very flexible, very plug-able, but it's better to stay within the boundaries that the framework gives you. Putting in plugins or things that do strange things, will cause you trouble. Stick as much as possible to those design principles. The reason for that is: they're there for a reason. Like I said, it stems from years of experience and they may have solved problems that you didn't know, that you don't know yet that you will, you might have in the future. Go with that flow and get on the NServiceBus, so to say. With that, I'd like to end the session and open up for questions. I'd love to discuss with you your experiences also, afterwards.