Reports from the Field - Azure Functions in Practice

00:02 Adam Jones

Hello and welcome. My name is Adam Jones and I am the Chief Technology Officer for LHP Telematics. We recently modified our architecture to include hosting our NServiceBus endpoints on Azure Functions. I'd like to share with you what drove us to adopt Azure Functions and our experiences along the way. To better understand what motivated our change, let me spend a few minutes bringing you up to speed on our business.

00:29 Adam Jones

Who is LHP Telematics? I doubt you've heard of us. We are a small business in Westfield, Indiana, with 16 employees, focused on providing telematics services to heavy equipment makers and operators. What is telematics? If you break it down to the Greek roots, you have tele for far, and matics for acts. You might be more familiar with a different term for this, the internet of things. When you think of IoT, you might think of household automation like light switches, temperature sensors, or door locks. Our version of IoT looks a little different than that.

01:11 Adam Jones

Our internet of things all live on a piece of heavy machinery. They use a special protocol to communicate on a shared bus called the controller area network, or CAN Bus. Your passenger car's CAN Bus is accessible via the onboard diagnostic port or OBD port that is under your dash. For heavy machinery, we tie directly into the CAN Bus with our telematics control unit.

01:37 Adam Jones

If you're up on your IoT vocabulary, you'll recognize the telematics control unit or TCU for what it is, a field gateway. It collects data from all of those sensors and controllers on the CAN network and relays them to our backend. They may use WiFi, cellular or even satellite communications to send that data.

02:01 Adam Jones

This is a look at our backend architecture. We implemented a service oriented architecture. Each of the named boxes here represents a service running an NServiceBus endpoint. We chose to deploy similar endpoints together on shared servers. Data sent by the TCU arrives at our backend at our protocol gateways, which we call listeners. These endpoints listen for incoming reports from the telematics units, parse them and acknowledge receipt back to the TCU. There are an enormous variety of telematics units on the market, and they all speak a unique language. The goal of each of these endpoints is to understand a specific protocol, then normalize the data, the report contained to a standardized message format.

02:51 Adam Jones

That standardized message format is then forwarded to our reporting endpoints for both runtime and long-term storage. We also need to perform arithmetic adjustments at this point. One very important value for our customers is total engine hours. Unfortunately, some devices report total engine hours by all, some report the change in engine hours since the last report, and some report the time since ignition on. We perform what we call a roll-up on all of these variations so that our customers see a standardized total engine hours for the asset.

03:30 Adam Jones

These reports are then published to our business logic and center endpoints. Our business logic endpoints evaluate alert conditions such as violations of curfew or position or data thresholds and maintenance schedules. Finally, many of our customers have their own backend systems where they want to receive a full copy of data. For these customers, we have endpoints responsible for delivering the reports and alerts to their systems.

04:01 Adam Jones

When we moved from on-premises hosting of our solution to the cloud, we chose Microsoft's Azure platform. We were already familiar with Microsoft's products and this felt like an easy transition. We decided to use cloud services to host our endpoints. This would allow for a pseudo platform as a service deployment where we didn't need to manage the OS, but we could still log onto the machine and it would feel very much like how we had previously remote desktoped to our own servers.

04:35 Adam Jones

In addition, we chose to use the shared hosting model that Particular supports with NServiceBus. A single shared parent would reference an Azure storage folder to pull in the zip files of our endpoints and deploy them automatically. If an endpoint should fail, it would automatically cycle all of the endpoints on the host. Scaling up or down was as simple as moving a slider on the scaling configuration page.

05:00 Adam Jones

Now, our system worked well, but we were feeling pressures to make adjustments to our architecture. Our first concern was with our choice of using shared hosting. This was cost-effective early when each instance in our cloud service had a lot of headroom. However, we were now finding that our endpoints were so busy that they were interfering with each other. So, do we continue using a shared host or do we move each endpoint to an independent cloud service? At the same time, we were questioning if cloud services should be part of our architecture at all. It isn't very reassuring to go to the Azure portal and see classic upended to the critical service you rely on.

05:44 Adam Jones

If we open up to the idea of changing our hosting platform, how far do we deviate? After all, there's an incredible variety of options. We're already familiar with Microsoft's Azure, so we could stay there. Amazon Web Services is the 800 pound gorilla in the market. Google has a solid cloud offering. How do we choose? And even if we decide to stay on Azure, there are literally 36 different compute options to choose from. And this brings me to my first tip, prepare for hosting diversity now. Sooner or later, your hand will be forced and you will either need to add new hosting options or transition to one. The earlier you prepare for this change, the easier it will be.

06:35 Adam Jones

The first thing you can do is get your solution in order. If you're anything like us, you started with what could be generously called cowboy code, a single monolith of technical debt. If you haven't done so already, segregate your solution into multiple projects with key goals. This is better. First, you absolutely must isolate your message definitions. Even if the message is something you only intend to send locally, you will thank yourself later. Personally, I prefer to use interface message definitions, but this can be a matter of preference. In any case, these message definitions should be isolated from any other code and ideally available as a new gate package from your development team.

07:27 Adam Jones

Second, you should isolate your message handlers. And finally, your hosting and configuration code should be segregated from both your handlers and your message definitions. By taking this step now, you will empower yourself with the ability to deploy to a variety of hosts later.

07:50 Adam Jones

Here's a quick example from our business from when we transitioned from on-premises hosting to cloud services. It was only necessary to add one new hosting project to each endpoint. The host would retrieve all configuration for that hosting environment and inject those dependencies into the hosted handler. By segregating our host project early, we afforded the opportunity to deploy the same code to our on-premises servers and our Azure cloud servers concurrently. This gave us the opportunity for a hybrid deployment as we confirmed our new cloud hosts behaved as expected with our on-premises hosts as a backup.

08:32 Adam Jones

Once our integration testing was complete, we were then able to shut down our on-premises hosting entirely. It also allowed us to easily experiment with other hosting environments. If we wanted to try a Docker container, I just needed to create a new Docker host. If I wanted to try functions, I just added a function host. The core handler logic and message definitions remained unchanged.

09:00 Adam Jones

Now, in addition to getting your solutions structured properly, you also want to be sure the framework you are using gives you the most flexibility. If you want the option to deploy to Azure Functions, a Linux Docker container or anything other than Windows, you should seriously consider moving from .NET Framework to the .NET Core. And I want to be clear here and specifically recommend .NET Core 3.1. Yes, .NET 5 is here and it is the next iteration of .NET Core. However, that said, you are not able to target .NET 5 for an Azure Function. What's more, only even numbered releases will have long-term support. When .NET 6 releases, expect Azure Functions to support that revision. Until then, focus on .NET Core 3.1.

09:57 Adam Jones

Now, the move from .NET Framework to .NET Core does come with some risk. First, you're going to want to validate that the libraries you depend on are available for .NET Core. Once you've migrated your solution to .NET Core, you will still want to do heavy integration testing. In our case, we found several areas where the core version of entity framework seriously deviated from previous behavior. Even though the API was exactly the same, we found the Core version varied in how and when it materialized entities and this impacted what ran on server versus what ran locally.

10:40 Adam Jones

Finally, be aware that if you rely on your app config to inject settings in your current .NET Framework solution, you will need to find another way. You could implement a custom settings file reader. You could read from environment variables or really a whole variety of options, but app config is not a thing anymore. This can be really painful if you're in the practice of defining project settings in your libraries and then bubbling those up to whatever project hosts them.

11:13 Adam Jones

So, we've gotten our ducks in a row and now we have the flexibility to choose an alternate hosting environment or even several. Our next concern was that our compute expense was not matching our compute load. This is a look at our message load over the past 30 days. The majority of the equipment that we track operates in North America and the equipment runs during daylight hours. This means our message load is heaviest during weekdays and pretty low at night and over the weekend. While our average message throughput is around 500 messages a second, we have peaks where we are in excess of 2,500 messages per second and we have valleys where we fall below 200 messages per second.

12:05 Adam Jones

That's not all. Let's recall our architecture diagram from before. We actually have more than one deployment. We have prod and test and dev. These are segregated environments where we can do our own testing and validation as well as allow our integration partners to test their own integrations. Our production environment processed over a billion messages in January. Our test environment sees that same sort of peak and valley behavior, but to a much lesser degree to the tune of a few million messages. And our dev environment, can you tell that our staff took vacation, and yet we paid for those VMs to be dark that whole time.

12:57 Adam Jones

Let's talk about scaling, and more specifically automated scaling. Given those message loads, how do we scale our endpoints to fit? With cloud services, you have two methods of configuring your automated scaling. First, you can use a schedule. This is the method we chose. We have weekday schedules that increase our instance count in the morning as machines come online and decreases the instance count in the evening as our load lightens. Alternatively, you can tie the instance count to a metric. For instance, you could have the cloud service scale up by one when CPU percentage is sustained above 80% for a period of time. We found metrics-based scaling to be frustrating as the metrics rarely related to our actual queue length and what was worse, it was quick to overscale and slow to give back those instances.

13:53 Adam Jones

Keep in mind that the minimum instance count for a cloud service is one. So no matter how little work you have to do, you will pay for a single VM instance regardless. I think of this as coarse-grained scaling. Now, compare that with functions. For queue triggered functions, the function scales with throughput. The exact method is not published. But suffice it to say, if you have a queue backing up the function will scale the instance count automatically to help consume the load. Even better, if there is nothing to do, it will scale down to zero, meaning no cost to you and there is nothing to do. I think of this as fine-grained scaling.

14:41 Adam Jones

Now, this leads to my second tip and a strong factor in why we chose Azure Functions. Azure Functions are fantastic for responding to volatile loads. They scale in direct correlation to the work we have. In addition, they scale to the work we do not have. Our dev environment can have long periods of quiet with absolutely nothing to do and functions allow us to scale to zero in response. If you have rarely called handlers, definitely consider a function host for the savings over keeping a virtual machine warm.

15:19 Adam Jones

Now, a lot of our business is based on helping our customers with loss prevention in one form or another, whether that be efficiency or maintenance. One area of our business that I'm particularly proud of is our theft detection and recovery. Stealing heavy equipment is sometimes as easy as bringing a trailer to a job site, loading the trailer and driving away. With our tracking equipment on board, we can know immediately if a piece of equipment that shouldn't be moving starts wandering around. An important factor of this part of our business is that it is time sensitive. If we want to provide our customer with the chance to catch a thief, we need to get data to them immediately. We've set an internal metric for ourselves that we not exceed five minutes from the time a report hits our system to the time it leaves to the third-party system. In practice, we're regularly measuring this time in 100s of milliseconds.

16:19 Adam Jones

Not every situation is time sensitive though. We've partnered with the City of St. Louis to provide telematics for their Metro bus system. Our goal was to help the city with one thing, this light. The check engine light, also known as the malfunction indicator lamp or just the idiot light. Buses receive regularly scheduled maintenance, but sometimes unexpected issues arise. Our goal was two-fold. First, we would alert the city to any buses that were exhibiting issues requiring attention. The driver is expected to do this, but human error and inattention, meaning it doesn't always happen. We provided an automated alert.

17:06 Adam Jones

Second, we would gather data from the fleet and turn loose some machine intelligence to provide predictions for when buses would require maintenance. In this way, we could eliminate breakdowns in the field, which are costly, by triggering maintenance checks when the bus is already out of service. Certainly we would have a large volume of data to process. How time sensitive is that data though? Do we expect mechanics to immediately bolt into action doing some sort of mad max style repairs on a vehicle as it roams the city? As exciting as that sounds, no. In fact, they don't even have a choice. Rather than pay for cellular service for each bus so that data can be transmitted real-time, the buses were outfitted with WiFi antennas and would only dump data once a day as they're returned to the fleet garage.

17:59 Adam Jones

Now, hundreds of buses all dumping a day's worth of data in a small window created a weird peak in our load. The load is dynamic, but it's not time sensitive. For that reason, we don't necessarily need functions for processing.

18:18 Adam Jones

Assess your business case. If you have time constraints, choose functions. If you are unconstrained by time, there are other hosting options that might be a better fit. You might find that a very small VM is able to work through that mountain of data before the next burst arrives for a better price than paying for every execution of a function; which brings up a great question. How does function pricing even work?

18:49 Adam Jones

Now, I'm going to focus on the consumption plan in a moment, but I do want to briefly touch on the alternative pricing models. Why would you choose them? If you are already using an app service, you can deploy your functions to it as well. This would allow you to take greater advantage of compute power you are already paying for. If you don't have an existing app service, the premium function pricing tier would be appealing if you need your functions to run in a virtual network. The premium tier affords other options as well, like selecting a different compute core size or allowing for long running functions, keeping a function warm, and more. It's worth noting that there is a time limit under the consumption plan of about 10 minutes where if your function runs beyond that, it will be killed. If you use the premium tier, you can run longer than that.

19:42 Adam Jones

However, if you don't need to run in a virtual network, the consumption plan is a great place to start. There are two elements to billing on the consumption plan. The first is resource consumption. This is a measurement of how much time and memory was used in executing your function calls. Currently, it's a fraction of a cent per gigabyte second. The second is easier. It is the total account of executions. Currently, that pricing is at 20 cents per million executions.

20:18 Adam Jones

So, what is a gigabyte second? There are two dimensions to this measure of cost. The first is the amount of time spent in your function. Note that this is not a measure of CPU activity. The timer starts on function entry and stops on return. Whether you spend that time using the CPU or waiting an IO call is up to you, but the billing is going to be exactly the same.

20:46 Adam Jones

The second dimension is memory consumed. Our endpoints typically consume anywhere from 64 to 128 megabyte of memory during execution. In addition, they are very quick to execute. Typically, on the order of hundreds of milliseconds. You will want to carefully measure your handlers to understand their consumption as it will have a significant impact on the cost of your functions.

21:11 Adam Jones

And this brings me to my next recommendation. Functions may not be a good fit if either of those dimensions of costs are outside of your control. Let me give you an example of what I mean by an out-of-control dimension of cost. Looking back at our architecture diagram, let's focus on those sender endpoints. See that little arrow there pointing to the factory looking icon? Yeah, that little guy has been the cause of a lot of headaches for us. What it represents is our link to send data from our system to third-party subscribers.

21:50 Adam Jones

You see, the heavy equipment manufacturers we work with have their own engineering departments dedicated to making their product the best it can be and they absolutely love to get as much data as possible from their equipment in the field to help them. So we're happy to help them with that, sending a copy of all the data we've received over to them. The thing is, where these businesses are amazing and making things like trenchers and cranes and forklifts, they can be absolutely terrible at creating data ingestion systems at the scale we work with.

22:26 Adam Jones

Last year, we added a subscriber with just such an unreliable ingestion point. This is a look at the round trip times as we delivered data to their system. What should take on the order of a few hundred milliseconds was taking sometimes minutes to get a response. Let me repeat that. The expected round trip time is between 100 and 250 milliseconds. The actual times were sometimes in excess of 100,000 milliseconds. That's three orders of magnitude variance. If we had chosen to implement these handlers as functions, our time dimension of costs would be at the mercy of our third parties. In this case, we've chosen to keep these endpoints on a cloud service host.

23:13 Adam Jones

Okay. You've established which endpoints are a good fit for moving to functions and you're ready to take the plunge. What are some gorges that you need to be ready for? If you're accustomed to using your own on-premises hardware or even virtual machines or scale sets or any other infrastructure as a service, you may have come to assume your discs are a reliable place to store data. Not just your configuration data, not just your application files, but runtime data too. Maybe you host your saga persistence in a local RavenDB. Maybe you store your logs in a rotating text file. If you are evaluating the move to functions, you need to understand that your disc can be reset at any time. Not just can, but it will and often. This is true not just for the disc, but the working memory as well. Your function is going to spin up and down very dynamically with your load.

24:15 Adam Jones

One area that this bit us is that we were in the habit of creating in-memory caches for our expensive look-ups. There were several cases where looking up a particular relationship in our database was expensive. The difference between looking up one element versus looking up all of them was pretty trivial though. So we would perform the full lookup and then cache the results in memory. This was very reliable when the VM was under our control. We could schedule the cache to refresh say once an hour. Functions are a completely different story. Our functions were getting created and deleted again and again and again, and each time the function was created, that expensive caching operation ran again. We chose to move our cache from in-memory to an Azure storage table. While the round trip time to fetch from that storage table was longer than the in-memory fetch, it was much better than running that taxing database query as often as our function scaled up and down.

25:19 Adam Jones

Now, I mentioned logging as well, and this is another area where we had an important lesson learned. With functions, you could see a live trace of your logging output, which is really great. When you create the function, you will have the option to use log analytics to store your logs and the default is to configure for storing the most verbose level of logging. This is wonderful if you're tracking down a bug or doing some other activity where you need a detailed runtime log. It's also great for seeing long-term trends or tracking down a hard to replicate issue.

25:54 Adam Jones

One lesson we learned, however, is to not use this default configuration with production functions. In two days, we racked up over $2,000 in log analytics expense on the single function we had put in production. We quickly moved to both lower the logging level and to a limit our total storage. So learn from our experience here. Use caution with your log analytics configuration.

26:23 Adam Jones

Those are some direct effects of moving to functions, but there are also secondary effects. Let's assume the endpoint you are transitioning is a bottleneck in your system. Think of it like a garden hose with your thumb on the end restricting the flow of messages. Moving to functions will not just be taking your thumb off the end, you'll be upgrading to a firehose. With your compute power turned to 11, you've solved one bottleneck. What new bottleneck will that expose? Will your data storage keep up? We use the GTU model for our Azure SQL instances. If using functions means we need to increase our DTU limit, we've just moved our cost efficiency problem from one place to another.

27:14 Adam Jones

Will your transport handle it? Will your third-party integrations handle it? If you unleash the firehose on a third-party with rate limiting, you will create a problem. Functions are not a good solution in this scenario. Now, we are on the premium tier of Azure Service Bus and we currently have it configured for four messaging units. Even with that tier and scale, our peak load sometimes means that we get throttled by Azure Service Bus. With message triggered functions, this is outside of our handler time, so we aren't concerned about that impacting our costs. However, it does mean it's a concern in that we have to do a balancing act to decide how long are we going to endure throttling? Because throttling isn't just that one function, it's everything connected to the bus.

28:13 Adam Jones

As we were looking into our Azure Service Bus throttling, we came to a realization. We had oversubscribed a particular message. When a message arrived from a device, our central center service would publish it and then each of our third-party integration centers would evaluate whether the third party was interested in that message. This resulted in a lot of no ops. By modifying our central center to first check if anyone was subscribed, we were able to considerably cut down our total message count. Now, this is good advice no matter what hosting environment you choose to deploy to, but it's particularly useful with functions where you will be paying for every invocation of your handler.

28:57 Adam Jones

So, should you throttle your function? How do you throttle your function? On the consumption plan, there are two throttling methods available to you. The first method you can use to throttle your function is to set a daily usage quota. This setting is somewhat hidden in the configuration panel under function runtime settings. By default, there is no limit. But here you can set a limit on the total consumption allowed. This is a great way to keep yourself from getting into too much trouble with functions as you're just getting started.

29:34 Adam Jones

That said, I don't recommend it for production functions and there are two reasons for that. One, you don't want to be caught by surprise with your production system shutting down when you reach a high load. The second is that once you do hit that quota, you can't reset it from the portal. Yes, you can go in and you can change the limit or even set it to zero to eliminate it, but the function is only going to restart at midnight UTC. You have to use a PowerShell script to force it to restart if you want that setting to be honored immediately. And if you're in the car or your monitoring team isn't available to run that PowerShell script, that means you've got downtime until somebody can get to a PC to make that happen, which is a risk.

30:23 Adam Jones

The second method you can use to throttle your functions is to use the scale out option. By default, your consumption plan function can be scaled up to 200 instances. If you want to limit that, you can provide a limit. But be aware, this is more of a suggestion than a hard limit. For instance, if you're hoping to limit your function to a single instance so that you process messages in order as if it were a singleton, that won't work. There will be times that there are two or more concurrent instances.

30:57 Adam Jones

And that brings me to my last point, which is that functions are for asynchronous out-of-order processing. If you depend on your messages being processed in-order, functions are probably not the right solution for that endpoint. I can hear you now. You're already raising your hands with exceptions. Yes, if you use timers and batch processing, you can make in-order synchronous processing work in a function. Here I'm referring to queue triggered functions. Let's remember, just because you have a hammer, you don't need to treat everything like a nail.

31:34 Adam Jones

We've covered how to scale and configure your deployed Azure Function. But how do we get it deployed in the first place? There are a variety of support integrations for both continuous integration and deployment as well as manual deployment. If you're hosted on GitHub, you can use GitHub actions. If you're hosted on DevOps services, you can deploy directly from your Azure Repos. We use GitHub for our source control and we prefer TeamCity for our builds and deployment. To deploy our function from TeamCity, we simply added a build step which you see here. The important bits here are the published profile and the password. The password you can grab from the Azure portal for your function in the FTP deployment section.

32:22 Adam Jones

The published profile can be retrieved in Visual Studio. By right clicking on your project and choosing publish, you're going to run through a wizard for publishing. It isn't necessary to go completely to the end where you publish it to the cloud. Instead, just be sure you save that configuration file. And then you're going to find that PubXML file in your project's properties published profiles directory.

32:47 Adam Jones

I find this screen particularly useful for another reason. When we encounter an issue with the deployed functions, sometimes I like to run my function locally in debug mode so I can catch the exception and debug it live. The link to manage Azure app service settings is very handy here. It allows you to easily copy settings from your deployed Azure Function to your local settings file and vice versa. Now, this is convenient but it's also very dangerous. Be very, very careful not to accidentally copy your local settings over your deployed function, especially if you're working with your production environment.

33:26 Adam Jones

Now, let's review. First, we're going to get our solution structure in order. Then we're going to make sure we're using frameworks that give us the flexibility to vary our hosting. Next, we're going to evaluate our endpoints to see if they're good candidates for function hosting. If your endpoint has a message load that is dynamic, either highly volatile or sporadic, if your endpoint message processing is time sensitive, if it doesn't matter what order your messages are processed in just so long as they get done, then your endpoint is a good candidate to be hosted as a function.

34:07 Adam Jones

Finally, we're going to avoid some easy pitfalls with functions. We're going to be aware that our disk and memory are ephemeral and use external storage resources between invocations. We're going to keep in mind that there are limited ways for throttling our functions. We can limit scale and we can set a consumption quota, but each of these has limitations. We're going to be aware that App Insights needs configuration prior to going ham with millions of function calls to avoid racking up unexpected logging costs.

34:41 Adam Jones

At LHP, we put that into practice in the following way. We looked at our protocol gateways. These endpoints listen on network ports for incoming UDP datagrams. These are not good candidates for function conversion because they are not message triggered. We looked at our reporting endpoints. These are message triggered. All elements of cost are completely in our control. Message ordering doesn't matter but it is important that it is done quickly. So this was a good candidate. Same for our business logic endpoints. Here, the time sensitivity is particularly acute. In order to react to an asset being stolen, we need to process that message as soon as possible. Also a good candidate for functions.

35:29 Adam Jones

Finally, we have our sender endpoints which route data to our third-party subscribers. Here we are at the mercy of our partners as far as time, so the cost is out of our control. We didn't consider that a good candidate. We chose to migrate our reporting and business logic endpoints while maintaining our listener and sender endpoints as cloud services.

35:55 Adam Jones

The benefit of this is not just in our system throughput, but in our bottom line. In October of 2018, we were totally cloud services based. Not just for our endpoints, but for our websites, our APIs and everything else. Our most recent invoice in December of 2020 is after we had migrated our chosen endpoints to functions. Our cloud service expense was reduced by nearly $2,000 while our new functions expense was only $700. This is a considerable savings, especially when you consider that our message load grew over that entire two year period.

36:37 Adam Jones

I'm really excited about what the future holds for us at LHP and we've been very happy with the results from adding functions to our hosting mix. I hope you found this helpful as you evaluate functions for your own business. If you'd like to reach out to me, your best bet is email, but also I'd be happy to respond on LinkedIn or Twitter. Now, Sean Feldman, he's been monitoring in a side channel here. Sean, do you have any questions or anything I can help with?

37:06 Sean Feldman

Thank you, Adam. Yes, before we move to the Q&A session, allow me to share with the audience the news about Azure Service Bus Functions and with NServiceBus public preview because I'm really excited to announce that the preview is moving into the Particular platform and will be released as an epiphal 1.0 version. So watch for the upcoming announcements for more information. And now the questions. We do have a few questions. While Adam is answering your questions and I'm reading those, I'll open up a poll for an additional question. If you don't mind, your feedback is much appreciated. Adam, let me get to the first question. And the question is from Rashid. Hi, thank you for having the webinar. I have a question on Azure Functions Instance Management. If there are, let's say, 10 command messages in the queue, will or can Azure Functions create 10 instances of Azure Functions and as be endpoint and demand handlers?

38:08 Adam Jones

That's a great question. We touched on scaling a little bit here. The exact method for scaling, Microsoft hasn't published, but it is doing a monitoring of the binding that's causing the trigger. In this case with queue messages, it's going to be looking at total throughput and the queue backup length and it will scale to adjust. If you have 10 messages in queue, will it spin up 10 instances of your function? It's hard to say. If each of those functions executes very quickly, it may only need the one instance. It's all going to depend upon that backup and throughput.

38:49 Sean Feldman

Thank you, Adam. The next question is coming from Dan. I'm interested in learning more about the trigger mechanism Azure Functions uses to spin out the correct functions for say incoming Azure Service Bus messages or database changes or when a long running query completes. Also interested in learning more about scaling with demand. That is, if I have an Azure Functions to perform a certain task and my load increases faster than one Azure Function instance can handle the demand, will Azure automatically create new function instances to handle the additional demand as needed?

39:28 Adam Jones

Yeah. Great questions. Dan, we'll get back to you with the link for where you can see all of the methods for triggering a function. There's a wide variety. The three primary ones that come to mind for me are timer based, message-based from a queue, and HTTP based off of a call. We focused in this presentation on those message-based. With regard to scaling, Microsoft has not published that scaling mechanism. So it's a bit of voodoo trying to guess that. But yes, if you're at a point where a single instance is not keeping up, the scaling is automatic. In fact, it will scale to 200.

40:10 Adam Jones

Now, from what I've read on other resources, that is not a smooth scaling from 0 to 200 all at once. There are I call them hesitation points where it might scale to 10, then to 20, then to 100, and then put the brakes on for a little while. But it will scale from 0 to 200. If you need more than 200 instances, just deploy your function again. You can have multiple deployments of the same function, and then you've got a multiple of that 200 instance count.

40:45 Sean Feldman

Thank you, Adam. Another question from Collin. Wouldn't Azure Function be a good choice to use for incoming messages from a third party system?

40:54 Adam Jones

Incoming messages, sure. Absolutely. We had trouble with the outgoing messages, which is where we had concern. But if you have incoming messages from a third party, I think it would be a fine way to wire that up.

41:08 Sean Feldman

Thank you, Adam. From Shailesh, two questions or two parts of the question. How we can manage the alerts in the Azure Function Service Health is good and... Sorry. How can we manage the alerts if my Azure Function Service Health is good, and how can I manage the retry or we can set a circuit breaker in case of Azure Function fails to execute. That's the first question and I'll read the next one right after.

41:43 Adam Jones

Okay. Yeah, monitoring is critically important. And if you're familiar with NServiceBus, you have first level retries and second level retries. Those first-level retries are the message has been taken from the broker and you're going to try it immediately if it failed the first time. Your second level retries are to allow the message to go back into the end of the queue and then retry it at some delayed time period. With functions, you're paying for every execution. So you really want to be careful about messages that you think might incur retries. If you're dealing with a handler, that's going to have retry logic as an expected scenario. You may want to look at a different option because you don't want to pay for all of that non work getting done.

42:28 Adam Jones

On the other hand, if it's something where you just want to have protection against another system going down, let's say you have some fear of a critical database system being down and you want to wire up second level retries, that's fantastic. What we do is we have second level retries set up to retry a message after a few minutes just in case we do run into that scenario you've described where you've got a critical system that's gone down. But then after two second level retries, we fire that message over to our error queue. And we created our own monitoring for our error queue that alerts everyone in our office to know what's going on, allows us to inspect that message, retry it locally, see if we're having issues that we can debug. There are other options though. ServicePulse is a product directly from Particular that will monitor that error queue and all of your other queues and is really helpful. In our case though, we had already built these tools and we just like our own in-house stuff.

43:25 Sean Feldman

Great. And the followup question, somewhat related, what's the difference between premium Azure Functions and app service-based functions?

43:34 Adam Jones

Oh yeah. We touched on that a little bit earlier, but I'm glad you asked the question. The premium tier, it allows you to do some things that you can't do with the consumption plan. The biggest one in my mind is to be able to deploy to a virtual network. The consumption plan will only run outside of a virtual network. So if you have a requirement that you run your resources within a virtual network, consumption isn't going to work for you. The premium plan does allow you to do that.

44:04 Adam Jones

In addition, the premium plan allows your function to execute for more than 10 minutes. It allows you to keep an instance warm. Let's say that you've got a long spin-up time on a particular element of a function. You can have it kept warm as part of the premium tier. The app service tier in my view is intended more for folks who are already deploying an app service and you've got extra compute that you are paying for but not using. In that case, you can point your function to deploy to that app service and run within compute that you're already paying for.

44:42 Sean Feldman

Thank you, Adam. Another question is about Azure Functions app. Can we have dependency injection container like Autofac used in Azure Functions app?

44:56 Adam Jones

Yeah, absolutely. In fact, when we moved from hosting on cloud services to Azure Functions, we were relying on NServiceBus's built-in dependency injection and it was totally the same as we moved from the cloud service to the function. We didn't have to change anything about our dependency injection. It all just worked. So if you're relying on Autofac or whatever your DI solution is, it should transport over without any issues.

45:27 Sean Feldman

Another question about Azure Functions app. Can we use Siri log with App Insights sync within the function app?

45:38 Adam Jones

I'm going to show my ignorance there and admit I am not familiar with the App Insights beyond our-

45:48 Sean Feldman

I think I can offer my help here Adam. The answer is yes and I believe Microsoft or the specific loggers provide an obstruction. If you're using logging mechanism other than straight App Insights, you can string your log eventually into App Insights if needed. So the answer is yes, it's possible or doable. But I do have an additional question. It looks like the questions are coming in. And just an FYI, if we don't answer your question during the webinar, we will make sure that Adam follows up and provide an answer to all of your questions. Aiden is asking a question. Adam, you mentioned that functions hosting is just a matter of adding in your hosting project. I'm sure it was a bit more than that. Any particular problems you have encountered when migrating from endpoints and message handlers in a single host to Functions?

46:49 Adam Jones

Our biggest frustrations were more around our transition from .NET Framework 4.6 to .NET Core, particularly with entity framework core. The hosting logic itself, the code, we're in the habit of using an endpoint configuration .CS, which is roughly 100 lines of code for setting up our transport, our logging, our database connection and our message routing. All of that remained virtually identical. The bigger issues we ran into were with how different the implementation was when moving from Framework to Core. And I really feel like that's what most people will encounter.

47:32 Adam Jones

Now, the other thing we ran into was that we had a run when the bus starts and stops call in which we would grab the ISession object and we would send messages as part of the endpoint starting. When we first started using the preview version of Particular support for Azure Functions, that wasn't available. But I got great news a couple of weeks ago from Sean that that session is now available. So if you do have startup logic where you need to bus off a message to do something like register a task or something along those lines, that's now available. So as far as the gap between what you were doing before on other forms of hosting to what's happening now with Functions, it's minimal, if anything at all, as far as the NServiceBus solution is concerned.

48:21 Sean Feldman

Thank you, Adam. Our next question is coming from Bashan. Are Azure Functions stable now? When using them some three years ago, we were having problems with status of the Azure Function. We needed to recreate functions from time to time.

48:38 Adam Jones

That sounds terrifying. I feel for you that that would be very unsettling. Yes, our production environment has been deployed with Azure Functions since October of last year and we haven't had any issues with functions just walking off and disappearing on us. So as far as I'm concerned, yes, they're at a stable point right now.

49:01 Sean Feldman

And the follow-up question, have you managed to create acceptance tests or unit tests or integration tests for Azure Functions?

49:11 Adam Jones

Yes. And this goes back to when I recommend segregating your projects. Having unit tests against the handler logic independent of your hosting configuration is critical. User and integration and unit testing can all run against your handlers and it doesn't matter where they deploy. The testing should be complete to know that your handlers are solid. Really, all you need to do at that point is if you're going to do testing against the host itself is you can run locally, as I've mentioned. You can run against the deployed for those integration proofs that you need to know that you're running well.

49:55 Sean Feldman

Thank you, Adam. Additional question. When would we use WebJobs versus Azure Functions app deploy to app service? What are some pros or cons that you would be able to bring up?

50:11 Adam Jones

This is another case where I'm going to show some ignorance. I am not familiar with WebJobs. So if you're looking for a comparison between Azure Functions and WebJobs, I'm afraid I can't give you any good guidance there. Sean, I don't know if you have any expertise in that area.

50:25 Sean Feldman

Yeah. From NServiceBus perspective and not just, NServiceBus can be executed with both. What I think is more important to understand is whatever your job is, it needs to execute 24/7 and continuously process message flow or it's a bit more spiky and you need an ability for significant scale out and scale in. WebJobs offer the first, Functions are much better suited for the second. So a WebJob is 24/7 executed process where a Function is only running when there are messages to be processed. That's probably the biggest differentiator that I would mention. In terms of pros and cons, the devil is in details. I would suggest to look into the scenario and what exactly your needs are and based on that select the right host.

51:20 Sean Feldman

And speaking of that, one more question. If we're integrating as soon as possible with Azure Functions, can we use NServiceBus sagas or Azure Functions will mimic saga's behavior? If that's the case, is Azure Functions behaving the same as saga persistence, saga steps, et cetera, or we're offloading logic and Service Bus handlers?

51:44 Adam Jones

Yes. In fact, we have sagas in several places that we're reliant upon in those endpoints that we converted to Azure Functions. So I can confirm the sagas will work as deployed in Azure Function, and it will behave just as it does now in your other hosting. Again, the one hiccup we ran into was in making sure that the libraries we relied upon for storage and the database moved from .NET Framework to the .NET Core. One area which was particularly important to us is with SQL geography types. As you can imagine, as a telematics company, geopositioning is really important to us. But those geodetic types aren't native to the SQL types library. So we had to find the right NetTopologySuite Nuget Packages to make that work for us. But as far as the mechanics they're involved with NServiceBus itself, yes, sagas are fully supported.

52:42 Sean Feldman

One more question and we'll probably be wrapping out from Alex. It would be nice to get an example of how web apps can communicate with Azure Functions. While we can't fill an example, is there something that you would be able to imagine, Adam?

52:59 Adam Jones

Well, now, HTTP is a trigger for Azure Functions. I've seen and I'm eager to try but I have not yet tried deploying a web API as an Azure Function. Our web APIs don't get called often, yet we pay for our dev, test and production tier virtual machines that are just out there quietly waiting for calls to come in. So I can't give you an exact description of a solution other than to say, yes, it is possible and I'm eager to try it.

53:36 Sean Feldman

And last question, the last minute from Valdez, is there a way to execute function before any other function trigger starts to work? We have requirements to create, verify existence of a bunch of service bus subscriptions.

53:51 Adam Jones

If you're doing subscription wire up and it has to happen prior to the message being handled, there is a small section before the message is handled where you're setting up NServiceBus itself and you could do something there. But as an alternative, you may want to look instead at having a different method for hooking that up. And the reason I say that is that if you have suddenly a burst load and your instances scale, then you're going to have multiple concurrent instances all trying to do that wire up right away. So in short, yes possible, maybe not ideal.

54:37 Sean Feldman

Right. The questions keep coming. Unfortunately, we need to wrap up. So all the questions will be answered directly using the email address that you've provided. That's all we have for today. On behalf of Adam, this is Sean saying goodbye for now and see you on the next particular live webinar. Thank you, Adam.

55:00 Adam Jones

Thank you everyone.

Reports from the Field - Azure Functions in Practice

🔗Why attend?

🔗In this webinar you’ll learn about:

🔗Transcription

About Adam Jones

Additional resources