Webinar recording
Messaging Without Servers
In the last few years, there’s been a shift from deploying applications on premises to deploying them in the cloud using services such as Azure and AWS. However, with so many options to choose from, it can be difficult to know what the trade-offs are.
🔗Distributed systems in the cloud
Building a distributed system in the cloud has never been easier. Both Azure and AWS give us a number of options not only in technology but in hosting. In Azure, we have Azure Service Bus and Azure Functions; on the AWS side, SQS and Lambdas. Each has their advantages and disadvantages when designing and building distributed systems.
Join Simon Timms from Inventive.io as he helps us navigate the options and how to design a distributed system to take advantage of each one, demonstrating the differences with NServiceBus endpoints.
🔗In this webinar you’ll learn:
- The options available for running cloud-native distributed systems
- Serverless vs. “serverful” architectures
- Designing endpoints for the cloud
- Adapting rapidly to changing scale
🔗Transcription
- 00:00 Kyle Baley
- Hello everyone. And thanks for joining us for another particular live webinar. This is Kyle Baley. I'm joined by a friend of mine and solution architect, Simon Timms of inventive.io. He's going to talk about messaging in the cloud specifically with respect to serverless architectures with messaging, with examples in NServiceBus.
- 00:22 Kyle Baley
- A quick note before we begin, please use the Q and A feature in Zoom to ask any questions that you have during the webinar. And we'll try to address them at the end of the presentation. We'll follow up offline to answer all the questions that we aren't able to get to in the live webinar. And we're also recording the webinar and everyone will receive a link to the recording via email once we have it available. Let's talk about messaging in the cloud without servers, Simon, welcome.
- 00:53 Simon Timms
- Thanks, Kyle. Thank you to particular for hosting me today. So I wanted to talk today a little bit about messaging without servers, kind of building out messaging infrastructures using cloud-based architectures. So when I originally took Udi's distributed systems course, it was I think 2008. So more than a decade ago now. And it was a different time. We were largely building kind of large on premise deployments and the messaging options that we had at that time in terms of transport were pretty limited.
- 01:41 Simon Timms
- Vast majority of people that I dealt with were doing messaging on MSMQ and perhaps connecting together two or three servers in order to get the throughput that they needed. We've seen a lot of changes in that space since then. There's a lot of newer messaging systems or messaging systems that have become more popular in that time.
- 02:05 Simon Timms
- So I'm thinking of things like SQS and Azure Service and even more direct messaging systems like RabbitMQ and Apache's messaging system. So now a lot of this messaging infrastructure has moved to a place where it is deployed on cloud. Every cloud player has at least one messaging solution. Many of them have more than one, and there are still a lot of companies though that are doing messaging kind of in-house.
- 02:41 Simon Timms
- And my feeling is that it's starting to become a little bit more difficult to justify running complicated messaging infrastructure in-house. Can be cheaper to do things in-house than out on the cloud, but it is difficult to find the right group people who can set up appropriate fail overs and infrastructure within your own data center, such that it would match the reliability that you could have out in the cloud.
- 03:14 Simon Timms
- So back then kind of people had sort of server closets that looked a little bit like this, a few switches, maybe a half dozen machines that were plugged into those switches, just located in a closet somewhere. For medium and small companies, it's made a lot of sense. It was cheap enough and easy enough to run your servers. Reliability kind of took a back seat to just getting something out there.
- 03:43 Simon Timms
- Today for very large or very technically proficient companies, it can make a lot of sense still to roll your own hardware and run your own data centers. So I'm specifically thinking here of a company like Stack Overflow who famously run their own servers without basing anything on cloud infrastructure. But they are a very talented group of people who are really dedicated to running their own servers to save themselves money.
- 04:11 Simon Timms
- If you're anything less than something like them, then moving into the cloud can make a lot of sense. And the reason for that is that cloud infrastructure looks something like this now. So this is, I think the Irish Data Center that Azure have, it's surprisingly difficult to find photographs of data centers, it's as if they don't want you to know where they are for security reasons.
- 04:40 Simon Timms
- But this is sort of scale that we're running at now. So the cooling infrastructure, the compute infrastructure, the electrical infrastructure and the communication infrastructure is mind-boggling, especially to those of us who grew up using 336 modems and being delighted that they could download anything in less than an hour.
- 05:03 Simon Timms
- So I mentioned that on these clouds there is high degree of messaging infrastructure created for you. So there are a lot of software as a service approaches that you can do using what's out there on the cloud now. So Azure has at least four messaging systems, Event Grid, Event Hub, Service Bus, Storage Queues.
- 05:26 Simon Timms
- GCP seems to have two, so Cloud Pub/Sub, Firebase Messaging. AWS has a lot. I stopped counting at about six this time, but I feel like in a previous attempt, I got up to something like 10 or 15 different approaches inside of AWS to do messaging. For people who are kind of not North American based, Alibaba has a fair few messaging pieces as well. And I'm sure that Oracle has something. And IBM probably have something too, but I find it very difficult to care about what it is that they have.
- 06:10 Simon Timms
- So there are lot of advantages to moving to the cloud to do your messaging. The first of them is that it is very easy to manage your cloud-based infrastructure. Gone are the days where it was difficult to provision new hardware. I remember starting on projects years ago in companies where the very first thing that we would do knowing that we had to be live in six months, was asked for hardware to go live on and it would take us the entire six months to negotiate with the IT group that we in fact needed hardware and that we would like it provisioned, even if they were provisioning just the virtual machine for us. It could definitely be six months worth of work to get it there.
- 06:56 Simon Timms
- Now it's just the click of a button to set up messaging. And the messaging that you do set up is really well-designed and reliable in the cloud. So the reliability that we see now in cloud messaging gives us four, five, even six nines of reliability in the cloud. It's highly unusual now to lose messages anywhere, unless it's something that you have explicitly done wrong.
- 07:26 Simon Timms
- There's great support for dead letter queues in the transports. Many of the transports like Azure Service Bus support Pub/Sub semantics, which makes building messaging systems a lot easier than trying to build messaging systems on top of something like MSMQ, where you really needed to struggle hard to set up some sort of a Pub/Sub system. Because the environment that you're running the messaging in is known to the cloud providers. It enables taking great advantage of some of the features of the cloud.
- 08:01 Simon Timms
- So being able to fail over, being able to transport messages between different regions, different data centers and just some of the constraints that you might see in a local deployment, you're not going to have out in the cloud. It's going to be more reliable than anything you could build on premise. And even if things do fail in the cloud, it tends to be in a way that they fail over and you don't end up losing any messaging.
- 08:30 Simon Timms
- And the APIs that are provided by the cloud providers are excellent. So there are wonderful APIs and documentation for sending SQS messages from any language you can imagine with huge throughput and then receiving those messages on the other end, the messages can be nicely de-serialized for you and provided in a fairly standard way. And of course, all of these applications, all of these queue systems integrate tightly with other services on the cloud.
- 09:04 Simon Timms
- So if you think about something like Azure Event Grid, it integrates with services on Azure so that you can take advantage of things that are happening in Azure and have your application react to those. And if you need to receive messages, chances are that there are APIs built out already for receiving those messages, and it makes it a lot easier for you, fewer things for you to have to worry about.
- 09:33 Simon Timms
- So completely unscientifically, this is my feeling about what the sort of average NServiceBus customer was about five years ago, everything that they had was deployed on premise, hardware provisioning was pretty fixed. If you were running a messaging, you would tend to have a cluster of I don't know, 10, 20, 30 machines, something like that, that you would run your messaging on, and that would be all that you would run it on all the time.
- 10:04 Simon Timms
- So there was no sort of dynamic provisioning or turning things up or down depending on the time of year. And that's simply because you needed to build out your load, such that your hardware could handle the highest degree of flow that you're going to get for the entire year, because you don't want to be in a situation where people are waiting a long time for messages to move through the queue. So you need to have beefy hardware, even if 80% of the time, it goes underutilized.
- 10:32 Simon Timms
- The transports that were being used for sort of in-house transport. So I mentioned already that MSMQ was very popular years ago, SQL transport, also RabbitMQ perhaps on-premise as well. Then equally unscientifically, my feeling is that in five years time, we're going to continue to see a shift away from on-premise for most customers into more of a cloud based environment.
- 11:03 Simon Timms
- I think there's still going to be a massive customers who remain on premise for various reasons. And these reasons could be just regulatory reasons that people are uncomfortable moving to the cloud. I often hear concerns from people that if we move to the cloud, then the Americans are going to get access to all of our data. I find it pretty unlikely that the Americans are super interested in the number of widgets that people are buying, but perhaps somewhere down the line, there is somebody who's building an AI model that can relate the number of widgets being bought to some sort of prediction about market movements that are making big money off of it. It's not me who is able to do that unfortunately.
- 11:48 Simon Timms
- Elastic provisioning I think is going to become a bigger and bigger thing. It's hard to think of any industry now that doesn't have at least some sort of spikiness to its load. So the canonical example is of course in e-commerce when Black Friday rolls around the number of people who are using websites to purchase goods, spikes dramatically, but even other industries, you can think of places where load is very spiky either during the day, there's much higher load or during the night, or perhaps during certain periods of a year. If you're a college or university, then your registration system goes unused pretty much the entire year, except for a week or two, perhaps once for a fall semester, one for winter semester.
- 12:37 Simon Timms
- So in these scenarios, being able to provision more resources rapidly and not pay for them the rest of the year is a very attractive proposition. And I think that people are going to move more to cloud transport. The cost of these transports is so low that it almost makes no sense to provision and run your own versions of these messaging systems inside the cloud. It just makes more sense in my mind to deploy in that way.
- 13:10 Simon Timms
- So of course if you're deploying to the cloud, there's a variety of different models that you can take depending on what your needs are for how you're going to deploy up to the cloud. And you can take advantage of varying degrees of software as a service that the cloud has to provide with you deploying just virtual machines or all the way up to something that is highly managed like Lambda or Azure Functions.
- 13:39 Simon Timms
- So to dig into that a little bit more, there are some cases where you might want to do bare metal hosting, and some of the cloud providers do provide bare metal hosting. So this is one step lower even than virtual machines, they will just provision you a box in the cloud on which you can run your code. It is pretty unusual I think for people to use bare metal hosting.
- 14:07 Simon Timms
- I have not run into a situation yet where I felt the need to do it. But there are some cases. So if you really need to have fine-grained control over performance, then this is an approach that you might take. If you're migrating legacy systems that rely on certain super low level APIs that might not work in virtual machines, this could be a place you would use it. Then possibly if you're really tuning a database that's designed to run on top of an actual disc, rather than on top of virtualized disks, then you might gain some performance from running on bare metal.
- 14:52 Simon Timms
- Taking a step up from there, virtual machines, so again, this is a fairly low level concept that is provided for across every single cloud. I mean this is basically just the way that you would run it in-house. So you would stand up a virtual machine with Windows or Linux on it and install whatever it was that you needed to run directly on that machine.
- 15:17 Simon Timms
- So conceptually, you could install your messaging infrastructure in this too, you can install your RabbitMQ, or you could just install handlers on all of these machines to process messages. To my mind, hosting on virtual machines doesn't offer much in the way of advantages over just hosting your infrastructure on premise. There's of course going to be some advantages around speed and the ability to scale up and down. But for the most part, you might as well just run it on premise. You're losing out on many of the advantages of the cloud by running on virtual machines.
- 15:58 Simon Timms
- So some of this advantages here is you're still responsible for patching and updating all of these servers. If there's a Windows patch out there that needs to be rolled out or Kernel patch, then you're pretty much on your own. You need to go and apply those patches and handle rebooting machines in a sensible round prop in ways such that your service don't go down completely.
- 16:20 Simon Timms
- A lot of times the management on these virtual machines is still done by kind of logging into the machine, running commands, observing the responses and using that information to get in you forward. Certainly, there's possibilities here with tools like Chef and Puppet to automate this away. But my feeling is that kind of Chef and Puppet have sort of ... And unstable I guess, to have fallen by the wayside in face of newer technologies. I don't hear much chatter about those technologies anymore certainly not as compared to say five years ago when I heard a lot more about it.
- 17:04 Simon Timms
- It's pretty difficult still to scale in a granular way on virtual machines. So it tends to be that virtual machines are quite large tools. So when you scale them up, you take pretty big steps adding an entire CPU, adding 10 Gig of memory at a time, those sorts of things. So, that sort of scaled or really granular scale of being able to scale up individual services, individual endpoints is probably a bit difficult on virtual machines.
- 17:36 Simon Timms
- Certainly when I have deployed something like NServiceBus out to virtual machines in the past, our tendency has been to deploy multiple different event handlers, multiple different end points to a single virtual machine, just because we have a lot of room on those machines to play. And that means that when we go to scale it, we're probably scaling services that don't need to be scaled as well as services that do need to be scaled out.
- 18:06 Simon Timms
- The startup time on virtual machines is also pretty high. I guess it's laughably high when you compare it to the six months to provision a server from traditional IT, but you're still talking kind of two to three minutes to provision a new server, which can be a long time if you're starting to see queues back up and you need to get on top of it.
- 18:33 Simon Timms
- There is of course Kubernetes, I should say perhaps containers instead of Kubernetes. But Kubernetes seems to have the majority of the main share these days. There are certainly other tools that you can use to run containers out in the cloud, DC/OS, and then various things native to different cloud providers. But most of the time now when I see people orchestrating containers, it tends to be with Kubernetes.
- 19:05 Simon Timms
- So the nice thing about these containers is that they deploy significantly faster than virtual machines. So the startup time that used to be two minutes is now closer to two seconds, which is a really nice advantage to being stand up containers that quickly. They reduce the kind of granularity of the things that you're deploying. So it means that you can get much closer to that point where you have one end point on one container, and now you can make much smarter decisions about what it is that you scale up.
- 19:38 Simon Timms
- So if you're seeing a lot of load on perhaps the login service, but not a lot of load on the service use to send people email, then you can just scale that individual set of containers up, and you can set all of this stuff up in Kubernetes scale automatically to spin up containers, spin down containers whenever you need them, whenever the law demands it helps you.
- 20:06 Simon Timms
- One of the problems I think with Kubernetes is that it's a pretty high bar for entry. So there's a lot of things that you need to understand. There's a high number of concepts that you need to grasp in order to get a proper Kubernetes cluster up and running. If you're doing it manually, it can be a very daunting task. Fortunately, this is another one of those spaces that's largely provided for by cloud providers at the moment.
- 20:38 Simon Timms
- So AWS have the Elastic Kubernetes Service, Azure have, I wrote Azure Virtual Machines, but I meant to write Azure Kubernetes Service and then Google have the Google Kubernetes Engine. So all of these services are now pretty mature and are working very well for many people out there. One of the things that you can do with these services now is you can provision serverless containers for them. So on Azure, it's Azure Container Instances, and on AWS it's a service called Fargate.
- 21:22 Simon Timms
- So these are containers that you can spin up without having to provision sort of the underlying virtual machines and add them into your cluster. So this brings some of the advantages of serverless to Kubernetes. But to give you an idea of kind of the differences between Kubernetes and true serverless, there are some advantages and disadvantages to both of them.
- 21:47 Simon Timms
- So Kubernetes is really the base language, the Linga Franca across every cloud provider, and even on premise. So if you have a container that you're deploying and you need to be able to deploy it to different clouds, then Kubernetes Docker containers, or just containers in general are probably the best way to go about that.
- 22:10 Simon Timms
- There are some companies who feel like they need to be resident on multiple clouds either for reliability purposes or for proximity to their clients. So something like MongoDB Atlas, which is a service that MongoDB provide for hosting MongoDB databases, you can provision their servers in any one of Google, Azure or AWS's environment. So it has good proximity to the services that are actually using that database.
- 22:47 Simon Timms
- Scaling of Kubernetes still requires some manual intervention or defining a fairly complicated rules in order to know when to scale things up and when to scale things down. The minimum unit of scale care is still a container, which is a pretty small unit of scale, but it's not as small as we can get. So you're still going to have to run at least one instance of every one of your end points in your cluster. So if you have 100 end points, you're going to have to run 100 different containers inside of your cluster.
- 23:22 Simon Timms
- Serverless has some advantages too. So although it is not transportable between different cloud providers. So everybody has a slightly different set of serverless functions. It does tend to provide deeper integration with the cloud providers. So I'm thinking specifically here or something like Azure Functions, you can set up Azure Functions very easily to talk to service bars, to talk to storage, to talk to Storage Queues, those sorts of things. And a lot of that plumbing is sort of taken care of for use.
- 24:00 Simon Timms
- You don't have to worry about it quite so much. The scaling on functions, on serverless functions, Lambdas, or it's totally transparent. So if you have 1000 messages in your queue, it's very possible that the serverless infrastructure will spin up 1000 instances of itself in order to process those messages.
- 24:22 Simon Timms
- In fact, I actually was working on a bug that we had last week related to this. So this was on AWS. We spun up a bunch of messages and sent them out into a queue. Unfortunately, I think we had some bugs in a couple of different places in that application. And we ended up sending 20 times as many messages as we meant to send, and the way that we were processing the messages was very, very slow. So we had some kind of unintentional delays inside of our functions.
- 24:58 Simon Timms
- So we were at a point where we managed to use three years of compute time in a course of three days, which was a lot more than we had intended to do. And we spent a lot more money than we meant to spend. But the advantages here was that we did scale up very nicely. We didn't notice any problems based on load, because the service infrastructure just scaled up transparently for us, which was, I suppose, good or bad in this case, it costs us a lot of money, but we didn't see any degradation of user performance based on that.
- 25:36 Simon Timms
- So another nice thing here is the minimum unit of scale here is zero. If an endpoint is not receiving any messages, you will not incur any cost for that. So the cost to run these functions is typically based on the length of time that function runs for, the amount of memory that it uses, and then the number of executions that get fired. So if you're not running any executions, then the cost associated here is zero.
- 26:11 Simon Timms
- So, there are a number of places that I think serverless is a great fit inside of organizations. The first one of these is low throughput endpoints. So I'm sure that you have end points inside of your system that get fired very infrequently. So things like maybe sending emails only happens once a day, or if you're doing month-end processing, there could be events that are only fired once a month.
- 26:40 Simon Timms
- As it stands right now, you might have to have end points stood up and consuming resources for an entire month just to receive that one message that comes in on the 30th of the month. Because there's no cost associated with running functions if they're not actually executing, this is a fantastic place to use serverless endpoints. It really saves costs.
- 27:04 Simon Timms
- It's also really nice for doing scenarios like doing a startup where you might only have two uses, the first little while you don't want to stand up big hardware infrastructure to serve your two users. Being at a set stuff off on serverless is a much nicer approach. So doing things on Kubernetes or on virtual machines, you're going to have that minimum load level that you need to pay for.
- 27:33 Simon Timms
- Another place that is handy here is doing unpredictable load. So if you are deploying new services and you're unsure of what their load is going to look like especially if you're deploying services that have variable load throughout the year, it's difficult to predict what the load is going to be. So it's difficult to know where you need to scale to. So you can deploy it out using a serverless infrastructure and you pay just per execution, knowing that you can scale up to almost infinite level, but you can also scale down.
- 28:09 Simon Timms
- And that's the bit, I think that people kind of miss on this serverless is that you can scale down to a point where it doesn't cost you any money to run at all. So handling a predictable load is a great utilization for this. A lot of services we already talked about earlier have peak load to them. So this is kind of a graph that I stole from a very low resolution PDF talking about power demand.
- 28:38 Simon Timms
- So we see these same sorts of things, of course in utilization, inside of companies as well that we have a kind of base load initially. So we see a certain number of messages that come through regardless of if people are using the system or the degree to which they are using the system. We're always going to see that kind of base level of messages that come through the system.
- 29:01 Simon Timms
- There's also kind of an intermediate level inside the system where this is the sort of stuff that you see throughout the whole day. So everybody comes in in the morning and that increases the base load to a certain level from the hours of nine to five, let's say. Then there might also be some peaks that we see that go even above that.
- 29:23 Simon Timms
- So we might see peaks here for when people log in, that's a momentary jump in the number of users on the system, on the demand of the system. So if you are running just a base load on your system, you might want to run this on virtual machines because overall, it is probably going to be cheaper to use virtual machines than to use a serverless approach. But you then want to be able to jump up to handle this intermediate load. So this is highly predictable load that you know you're going to have every day.
- 29:57 Simon Timms
- So perhaps every morning at 8:30, half an hour before people come in, you turn on an extra 10 virtual machines and you distribute the load out over those, and that handles your kind of intermediate load. Then that peak load, which is a little bit less predictable than your intermediate load is going to be stuff that you could handle on serverless architecture.
- 30:23 Simon Timms
- So this allows you to remain productive and remain fast throughout the day while keeping your costs low. So this is an interesting approach, and of course you can mix and match the tools that are used to consume messages here. So some of those messages could go to virtual machines. Some of them could go to functions, and it doesn't really matter, because they're all going to be doing the same thing. But we're able to handle these load changes.
- 30:52 Simon Timms
- Then I mentioned this a little bit before, but you can bootstrap your application here without a great deal of investment. So if you are standing up a new service that you are unsure if it's going to be productive or useful, then this is a really quick way to get your application out there as on serverless. If you're a company doing a start up, then again, this is a great place you can go and spend next to no money and still build up a big piece of infrastructure based on serverless. So very low cost of entry.
- 31:27 Simon Timms
- Finally, you can just go all in on serverless if you are unsure about how successful your system is going to be, or you don't want to worry about when you need to scale things up and when you need to scale things down, then going all in on serverless can be a useful approach. It might cost you a little bit more, but sometimes having the mind share available to do other things and not worry about fiddling with scalability knobs is pretty useful.
- 32:02 Simon Timms
- So, this is of course a particular sponsored event. So I'm going to talk a little bit about NServiceBus, but I'm going to actually skip over the slide and come back to it. Because I wanted to show you sort of what this stuff looks like when you're deploying and when you're building applications out using this.
- 32:22 Simon Timms
- So I have here an idea of what a Lambda might look like right now, and we can see some of the constraints and some of the advantages of using Lambda here. So what I've done here as I have just a kind of file new project on a Lambda. So this is a template that is provided by AWS in order to stand up new Lambdas, and deploying this is really just as simple as right clicking and going to deploy, or of course, pushing it into a build pipeline, which is the right choice and where we want to get everybody to eventually.
- 33:02 Simon Timms
- But this is basically just standard sort of C# code that we have. It gets pushed out into the Lambda, Lambda support a variety of different languages. But we're going to talk around C# a little bit here today. So the event that we get in here is basically a batch of messages that are received. Then we process through this batch of messages on the Lambda and do some work on it. The size that these batches can be configured when you set up your SQS queues. So sorry, I should've mentioned here that we're using SQS as the source of events here.
- 33:41 Simon Timms
- There's a whole bunch of different things that can trigger Lambdas. So everything from HTTP requests to SQS to SNS, to changes in S3 storage, those sorts of things. So there are a wide variety of different things that can trigger these events, which is one of the nice things about deploying out to the cloud, is that you kind of get all those event triggers for free and you can mix and match different triggers for the different parts of your application.
- 34:11 Simon Timms
- So we get this batch of messages in, and then we're just going to go and process each one of those messages here. And the messages are very simple. They basically contain a body, and then as well as a number of attributes that give you information about sort of the envelope that it came in. So the region that it came from, MD5 hashes of the body, the message ID, and then you can put some additional properties on the message if you want.
- 34:41 Simon Timms
- Then this is just asynchronous. So you do whatever work you need to, and then you've just return at the end of the function here. But there are some disadvantages you might notice here that the type of the body that comes in is just a string. So you need to know sort of what message type is coming in on your end point here. I have seen people build this out before such that you have kind of one SQS queue per type of message. That's certainly an approach that you can take. So you don't have to worry about determining what the body is when you de-serialize it.
- 35:16 Simon Timms
- But that does mean that you end up with a lot of queues and everything has super fine grained. So you end up looking at a lot of queues to try and take those problems. The sort of NServiceBus approach is closer to you would have a queue and you would have multiple different message types that might be processed by that queue. If you need to send additional messages within here, then you have the responsibility of kind of doing up your own SQS client and sending the messages. Anything to do with serialization, deserialization, you're responsible for.
- 35:56 Simon Timms
- So there is some work here that you need to do in order to set up messaging, but that being said, this is a very quick and easy way to set up a message handler. There's almost no ceremony around this. I haven't put in any special start-up code or anything like that. This just deploys and works nicely.
- 36:19 Simon Timms
- Flipping over to Azure Functions, their approach is pretty similar here to the way that we saw it inside of Lambdas. Azure Functions are a little bit more tightly integrated with the cloud itself I suppose. So while over here in the Lambda, you could actually get in any sort of event here. So the C code claims that it's an SQS event, but you could actually have this thing fired from a variety of different messages.
- 36:55 Simon Timms
- Inside of the actual functions, things are harder coded, then you would get inside of a Lambda, so you can get some advantages around that, around getting more on three types in which I think saves some errors that you can potentially make inside of Lambdas. There's also better integration with outgoing messages inside of Azure Functions. So if you need to right to another queue or you need to write to storage or CosmosDB or something like that, then there are triggers and outputs that you can use inside of your Azure function that make that a little bit easier to do.
- 37:36 Simon Timms
- But for the most part, you have the same sort of problems here that you're getting in a queue item and you need to handle de-serializing that yourself and understanding what the message is and what the properties are and how to handle that message yourself.
- 37:53 Simon Timms
- So this is kind of the way of setting up serverless on different clouds. But some of the advantages that you can get from NServiceBus coupled with this I think are worth exploring. So recently, particular has put out two pretty interesting packages in my mind. Well, maybe even three interesting packages in my mind related to doing serverless handling of messages. So I hope we think that this is a funny story that I talked to Sean Feldman who's one of the developers on the Azure function side for particular about this concept maybe two years ago.
- 38:37 Simon Timms
- And I said, "Sean, there's no way that you can possibly do this. The startup time on NServiceBus is too long. It's going to cost a fortune to run it on functions." And he's completely proved me wrong on that. So startup time is now milliseconds, and I'm super impressed by the work that he's done on this. But if we were to look at kind of the Azure Service Bus based method of doing NServiceBus integration, this is a very simple example here of what this would look like.
- 39:11 Simon Timms
- So this is just a way of setting up NServiceBus to tie into Azure Functions. So you'd give it sort of the same stuff that you did without NServiceBus, but now you just received a message in, and you immediately handed off to the NServiceBus processing infrastructure. So this means that it will handle things like deserialization for you. It will handle sensible retry policies. It will handle dealing with batches of messages which is less of a problem on Azure Functions than it is inside of Lambdas. Because remember Lambdas, we get back in a bunch of messages here in a batch and it poses questions like what happens if I process six of the messages in this batch, but the remaining four messages fail, what do I do then? How do I retry just those four messages that failed?
- 40:07 Simon Timms
- So once you've handed it off to NServiceBus, it makes things a little bit easier from that point of view. NServiceBus also handles, being able to send new messages and do event subscriptions and those sorts of things. So there is a great deal of advantage in my mind of using NServiceBus on top of something like Azure Functions or on top of something like Lambda.
- 40:32 Simon Timms
- It takes care of a lot of plan and code for you, which can be one of the problems with dealing directly with things like Azure Service Bus or SQS, is that there are a lot of kind of interesting edge cases inside of those transports that make it difficult to increase the amount of understanding that you need to have of those transports before they're useful.
- 40:57 Simon Timms
- So a good example of that is SQS message timeouts. So you can delay a message inside of SQS so that it is delivered later. So you can think of this as something like a timeout where you say, "I really want to send this email, but I don't want to send it now, I want to send it in 15 minutes or in 20 minutes time." So you can delay the delivery of that message.
- 41:22 Simon Timms
- But SQS only allows you to delay messages for 15 minutes. Why 15 minutes? I don't know. But that's the maximum amount of time you can delay a message for. So if you wanted to delay for 16 minutes, then it becomes much more difficult to do and you have to store counters and retry messages, which is pretty difficult, but NServiceBus handles that problem for you using the SQS transport.
- 41:55 Simon Timms
- Then there is also an AWS package for NServiceBus, which works in a pretty similar way to the Azure Functions one. So, same sort of idea here is that you have the same handler that you had before, that receives a message, and the Lambda context, but now instead of having to figure out what sort of message it is and deal with deserialization and all of that yourself, you can just hand it off to the NServiceBus and structure, and it will take care of all of that, plan and code for you.
- 42:34 Simon Timms
- I think these were the kinds of the things that I touched on looking at the code. So, as I said, there are very few packages available for Azure Functions, AWS Lambdas. I think that it's a very interesting approach. So these are in public preview now, which I think have good life licenses associated with them. Someone will correct me I'm sure If I am wrong about that.
- 42:55 Simon Timms
- So it handles, these factors handle serialization deserialization, the it's handling of different message types coming in on the same queue much easier, it handles batching and it facilitates things like transactional processing using outbox, kind of using all the same things that you're used to inside of NServiceBus.
- 43:15 Simon Timms
- And on top of that, because you're not interacting directly with the low level guts of AWS and Azure, it means that it's much easier to move between different transports or between different clouds. If you have that kind of NServiceBus layer in there. It's still going to be a little bit of free braking if you're jumping between clouds, but the abstraction that NServiceBus provides over these clouds is very useful to my mind.
- 43:49 Simon Timms
- So there are some gotchas of course to doing serverless. There's no free lunch anywhere out there. One of the problems that I have seen is that you might scale your serverless functions beyond the capability of other services. So if you have 10,000 messages that show up in a queue all of a sudden, then serverless scaling up to 10,000 processes might be enough to bring down your database server or enough to bring down partner services that aren't quite as resilient to load Azure.
- 44:19 Simon Timms
- It can be more expensive to deploy to serverless than virtual machines. I can't guarantee that but there's a lot of ifs, ands, and buts in that one. But if you handle your base load inside of virtual machines and then your peak loads inside of serverless, that's probably the most cost efficient way of doing things. There are some restrictions on what can be run. So there are time limits to what you can run on Lambda and functions. There's also some low level stuff that you can't do in those environments.
- 44:53 Simon Timms
- So that's just something to be aware of. So things like talking to external services can be a little bit trickier in those places. Writing files to disk, doing big processing. Time limits on processing I mentioned that. So there are ways of deploying functions, at least that remove the time limits on them, but doing so kind of increases the cost of running them.
- 45:18 Simon Timms
- So if you have messages that take a long time to process, then serverless might not be the best place to process those messages, but you can of course mix and match your infrastructure. So those expensive ones can go to somewhere else. The runtimes for these serverless components are still pretty specific to the cloud that they run on. So you do need to be aware of that and choose the right options there and know that it's more than five minutes work to transition from Lambda to functions or functions to whatever it is, GCP cloud functions.
- 45:55 Simon Timms
- Then some of the stuff around messaging can be a little bit tricky, understanding the transports can have some limitations. So you just have to know things like what are my next message size, how long do messages live? Are there any concerns with setting timeouts on messages, those sorts of things.
- 46:17 Simon Timms
- Then the pricing of NServiceBus has changed relatively recently to be a lot friendlier to this sort of thing. So I don't remember exactly what the licensing used to be like for NServiceBus, but it used to be based on kind of cause that you were processing on, which is obviously a very difficult metric to use if you're doing serverless. So the latest way of building pricing for NServiceBus is a lot friendlier to that. So the cost is now based just on the number of logical end points.
- 46:49 Simon Timms
- So even if I scale out my serverless processing to process 1 million messages simultaneously, and I've got hundreds of different instances out there, it still only counts as one logical endpoint. Then as a cost just per number of messages that are processed every day. So I think that was kind of everything that I wanted to talk through here. I know we have some questions that have been coming in, so maybe I will hand that over to our moderator to pass some questions through.
- 47:24 Kyle Baley
- That would be my cue there Simon, I do have some questions that have come in. When would you suggest that you use NServiceBus with serverless, either AWS or Azure Functions versus straight ahead, Azure Functions or Lambda on its own?
- 47:44 Simon Timms
- So certainly if there's already an investment in NServiceBus, it makes a lot of sense to me to continue to invest in NServiceBus and use that on top of functions. If you're processing messages that are very simple, you don't have complicated orchestration things. So messages don't interact with messages, don't cause other messages, then doing it directly in functions is an AOK way of doing things. But as soon as you start sending messages and replying to messages, and then sending events and those sorts of things, I feel like the advantages of NServiceBus start to pay for themselves very quickly.
- 48:28 Kyle Baley
- And another question, what do you do about the issue of slow startups with serverless or is that even still an issue these days?
- 48:36 Simon Timms
- Well, the NServiceBus start-up time is very fast, serverless itself depending on which tier you have and which provider you have, start up can be pretty quick. Once applications has started, they tend to stay memory resident for a while. So even though it is "serverless" and I'm making air quotes there that you can't see, it obviously still runs on a server behind the scenes.
- 49:03 Simon Timms
- So both ... Well, all of the systems that I'm familiar with which is Lambda and Azure Functions, most part, when you start to function on those, the startup cost is incurred once and then that function kind of remains memory resident for the duration requests are routed into that. So I have not had too much trouble with startup costs around this stuff. I guess the worst that I have seen is I have some Lambdas which are unfortunately written in Node.js that have about a five second startup, I think.
- 49:45 Simon Timms
- And at the moment, we have sort of two options, either just suffer quietly on that, which is the approach that we have taken, or you can keep services alive by just sending and kind of keep alive packet to them every once in a while just to keep them memory resident.
- 50:05 Kyle Baley
- Or if it's something that happens very infrequently, then it's usually not really an issue.
- 50:10 Simon Timms
- Yeah. That would be my feeling, that if you're going to incur a ten second startup cost on a message type that only comes in once a month, that's probably not a big deal.
- 50:22 Kyle Baley
- Now, do you know if there are any options for message auditing and serverless? For service control, do you have any recommendations for running service control in the cloud?
- 50:33 Simon Timms
- I am not familiar enough with that to give a good answer to that. I'm afraid. So that might be one that we have to feud off to smarter minds than mine.
- 50:45 Kyle Baley
- Now, another question, how do you deal with say side-effects in serverless environments, storing data or sending emails and things like that?
- 50:58 Simon Timms
- So of course all those things, there are serverless approaches to most of those too. So if you need to store files, the key observation is of course that you don't have a local disk that you should be writing to, because that might disappear. They have a wonderful word for it, is it ethereal I think, discs that they use.
- 51:18 Simon Timms
- So instead you want to write that stuff to something that you know is going to be around and Blob Storage S3 kind of the best place to write those things or directly into a database. For something like sending an email, I saw the same problems apply there that would apply in any other sort of messaging that email is kind of non transactional by nature.
- 51:42 Simon Timms
- So hopefully, you can send the email and it'll end up being delivered, but there are lots of services out there for sending emails and for handling that sort of stuff. So I have used SendGrid in the past that has been somewhat reliable for me. And then there's a million other services out there too. Postmark and Mailgun and a bunch of other stuff for sending emails, but all of those are kind of pay for services that pair nicely with serverless infrastructure.
- 52:21 Kyle Baley
- Another question from one of the participants. Would you use the same sort of Azure service for point to point versus Pub/Sub and NServiceBus terms that would be for commands versus events?
- 52:42 Simon Timms
- Yeah, I think you could use the same function for that.
- 52:48 Kyle Baley
- For example, would you recommend using say Azure Event Grid for example, for probably subscribe or for example, the Azure Service Bus for the commands for point to point site? Or is it-
- 53:06 Simon Timms
- Using a different transport with different terms of messages?
- 53:09 Kyle Baley
- Yeah.
- 53:11 Simon Timms
- I think that you could, but it probably adds a lot of complexity to your system. It's going to be a lot easier if you just have a single transport, you don't have to worry about the semantics of one transport then. I mean, I was talking with somebody on Twitter who was probably even here in the audience about choosing between Azure Service Bus and Azure Storage Queues for messaging. And his concern was that Azure Service Bus is a really big hammer.
- 53:42 Simon Timms
- So it solves a lot of problems. And at least in his case, the best approach was to use something like Storage Queues, which is a cheaper but less capable service. But some of that kind of lack of capability is smoothed over by using NServiceBus on top of Storage Queues. But I don't think I would mix and match transports unless I heard a really compelling reason to do that.
- 54:10 Kyle Baley
- And you mentioned a number of different transports that you can use with cloud messaging. How do you pick the one that's right for you?
- 54:21 Simon Timms
- It is a tough call, especially if you look at something like AWS where there's 10 or 15 different transports available to you. If you need something that has kind of Pub/Sub Symantec, then a lot of those services are not going to work for you. So SQS does not have Pub/Sub. Symantec Storage Queues on Azure does not have Pub/Sub. So for people who haven't used Publish and subscribed before, the idea is that SQS has kind of a point to point messaging system.
- 54:56 Simon Timms
- So you send a message and it just received once and handle, whereas Pub/Sub, you can send a message, and then have multiple things and subscribe to that message and act on it. So if you were racing an event inside of your system like emails sent, then you might have four or five different services that are really interested in when an email is sent and will want to subscribe to that message and react to it.
- 55:24 Simon Timms
- So that's one of the things you need to consider, and then there are some cost considerations around it that are worth considering too. But for the most part, it's going to really depend a lot on your situation that you want to be careful and kind of examine your options right off the bat. And that's one of the nice things about using NServiceBus. And this too, is that if you decide to use transport A and then change your mind later to transport B, it tends to just be a quick change to a configuration.
- 55:59 Kyle Baley
- Now, one last, I think this is the last one I'll double-check, how do you manage resiliency when the transport's not available?
- 56:09 Simon Timms
- So when there were failures in the actual transport itself, so first off I think the reliability of messaging queues and messaging services in the cloud is fantastic, but of course to assume that things are around all the time, which is one of the nice things about queues, is that you can put messages in there and process them as services return online.
- 56:35 Simon Timms
- But if your transport itself has fallen down, then I think you have to approach that in the same way that you would on-premise, which is maybe storing the messages locally and waiting for queues to return, and then sending them there, or rejecting messages from users and just failing outright when you can't talk to the distributor of those messages. So that's tough and I'm hoping that it's something that doesn't come up too often, that those highly resilient services fall down. But basically the only answers in those cases are save it and send it later or just fail outright.
- 57:15 Kyle Baley
- Great. And how do retries work with Azure Functions?
- 57:29 Simon Timms
- So with Azure Functions, it kind of depends on the way that you have everything configured, but the same sort of idea that if you're just using kind of raw Azure Functions, instead of NServiceBus on top of Azure Functions, you fall back to whatever the retry logic is available within your transport.
- 57:48 Simon Timms
- So that's largely configurable for something like Azure Service Bus, you can set up a retry policy within it and you can say like, "Hey, receive this message three times and if the message is not processed, then move it into an error queue." And the same sort of thing could be set up to think within Azure Storage Queues. So kind of this retry stuff works out of the box in the same way that you would expect it to with other transports that you do get configurable number of retries, and depending on your underlying transport, you can even configure things like delays between retries and those sorts of things.
- 58:32 Kyle Baley
- I agree. Well that does it for our time today. We want to be mindful of everybody's time here. So if we didn't get to your question, we'll send out an email and respond to it afterwards. So, I forgot to do so earlier, but I also want to apologize for the mix-up with the time zone earlier, if you were affected by that, I spent the good part of an hour, trying to think of a decent excuse, but didn't come up with anything other than maybe a vague reference to killer bees or something.
- 59:05 Kyle Baley
- But that's our time for today and Simon, I want to thank you for preparing this. And for the benefit of our participants, Simon is a Solution Architect for Inventive IO, which is a consulting and training company based in Austin. And he's also a member of our particular Chance program. So thank you, Simon. And thanks to everyone who joined us today, you will receive an email within the next week linking to the recording once it's prepared.
- 59:30 Kyle Baley
- So on behalf of Simon, this is Kyle Baley saying goodbye for now, and we'll see you at the next particular live webinar, which we've tentatively scheduled for early in the new year. So keep an eye on the website or your email for details on that. Thank you.
About Simon Timms
Simon is a polyglot developer who has worked on everything from serial port drivers on an Android tablet, to NServiceBus, to processing tens of thousands of messages a second using stream analytics, to building Angular web applications. All that in the last year. He is the author of a number of books on JavaScript and ASP.NET and blogs far less regularly than he should on both the <a href="https://www.westerndevs.com">Western Devs site</a> and on <a href="https://blog.simontimms.com">his blog</a>. He is also the least visible member of the <a href="https://aspnetmonsters.com">ASP.NET monsters</a> .