Using OpenTelemetry with NServiceBus

00:00:02 Jimmy Bogard

All right. Well, thanks for coming on everyone. My name is Jimmy Bogard. You can find me on Twitter at @jbogard. And this presentation, the slides and the code, everything, you'd find on my GitHub at github.com/jbogard. We will be sending out direct links to all these slides of material after this as well. And I blogged about this topic and other things, I guess, on my blog at jimmybogard.com. And a little bit about me. I'm an independent software consultant out of Austin, Texas.

So real quick, I want to talk a little bit about why we should care about OpenTelemetry in the first place. And I should say that, for the longest time, I didn't have to care about anything remotely similar to OpenTelemetry because, for a good part of my career, the systems I was writing was basically just a single application running in a single process and that was basically it. And if something went wrong with that system, something blew up, then I would get this stack trace that told me exactly where in my system that the exception occurred. I could put some handling in my applications. I could log that information. But it would tell me just an exact trace of where my exception occurred.

But as my career started expanding and I started doing more and more complex systems, it became a little bit more complicated to understand when thing went wrong in my system, in my application, how that related to something upstream. So in this case, if I have some other downstream system that has some exception, how do I relate that back to the originating request in some upstream system, API service, something?

And one of my first kind of larger distributed systems, when we saw this problem, we put our heads together, tried to figure out how can we, with all these different exception messages, how can we try to correlate all these different pieces together? And that introduced the concept of distributed tracing. So the idea here is that each arrow that we have in communicating between different services and applications, we include a little bit of metadata that allows us to relate back to the call from the previous system.

And so it becomes kind of like a graph of different calls or have a pointer to the previous call as part of whatever I'm logging. So we had some centralized logging service that would have the idea of whatever request this was, but also included the parent ID of the incoming request. So we had to also make sure that those IDs were propagated throughout the system. Now, we were very proud of ourselves for coming up with this, but it took us months to build something like this.

And a lot of the early dot-com companies also had these similar kinds of problems and started to create their own solutions and eventually open sourced them. And one good example of this is Zipkin. Zipkin is an application and set of standards that defines how to propagate, how to define these different headers for the different unique request IDs for these different executing processes, and then how to gather them together. And then finally, an application that is able to display them in some kind of meaningful format. So with these kind of tracing tools, you can trace an entire set of requests throughout a complex distributed system.

Now, as it happens that more and more companies started building their own tools and tried to start to standardize around these different things, we had multiple competing standards for these different header formats, collecting this information, storing this information. And so one of the things that happened in the last few years was the W3C stepping in and saying, "Let's go ahead and build a standard for these different, the headers and the formats as they flow to the system."

So this is now the W3C trace contact standard plus API, which we'll be talking about in a little bit. And so the idea here is that we'll standardize on the header data as it flows through the different systems, and then have some, then, way to be able to gather that information outside of this. So that's now baked into .NET, .NET Core. This W3C trace contact standard for these different header formats.

Now, when it was coming time for me to build a system that did APIs and NServiceBus, this was a challenge because the W3C defined a standard for HTTP. But for those that have used sort of NServiceBus tracing tools, they have their own kind of tracing headers standard as well. And it's not just NServiceBus, but basically any kind of distributed system that's outside of HTTP also defines its kind of own proprietary tracing standard.

And so we have to figure out, well, what are we go point to do for non-HTTP communications to be able to have those kinds of headers flow through the system? Now, in the W3C standard, what they recommended is there's nothing in that standard that is specific to HTTP, it's just headers plus some information. So for those cases, we can just leverage and use the existing W3C trace contact standard, and just apply it to NServiceBus.

So this is propagating the information throughout my different systems. The second part is then reporting them. So something has to gather the information from these traces and pull them together into some kind of internal database and then provide a user interface for us to be able to view these traces. So that's one of the other pieces that Zipkin provides us, is a way to gather the information from these different systems and then be able to display them in a meaningful format.

There's not just Zipkin out there. There's I don't know, in the poll, there's about a dozen, we laid out there, but the problem, from an infrastructure perspective is that, if I want to develop some kind of new tool that wants to report traces, then I have to go through that matrix of trying to support every single different tracing tool out there to be able to collect that information and then export it out to my specific end provider.

That started to become a huge pain and so that's really where the OpenTelemetry project came into being was to try to standard on the collection of that information, the exporting of that information, and then finally APIs and SDKs to be able to do all that from any variety of different platforms and providers. So all these different systems and services can now communicate via a common language and common set of standards.

And so my individual infrastructure components don't have to have all these extensions for specific tracing providers or logging providers. They could just support this OpenTelemetry standard and then I can use any one of the office job products to be able to display that information.

Now .NET was in somewhat of a unique position here and that's they already had existing solutions for telemetry across the three pillars of observability, which are logging, tracing, and metrics. So the .NET team had a decision to make. Do they look at the OpenTelemetry standard and just adopt it whole cloth, that is saying we're going to have the same exact same APIs, the exact same STK, the exact same objects, or try to leverage what we have already so that we can build on top of it, as opposed to throwing it all away, and then not being able to use it from existing systems. That is unlike really any of the other platforms that OpenTelemetry targets is that Node or Ruby or Python don't have these well established standards that are baked into the framework for observability, but .NET does.

So if you look at OpenTelemetry in .NET, the state of it as it is today, if you want to use "OpenTelemetry" for .NET for logging, you actually get this by just using the new standard for logging, which is microsoft.extensions.logging. I'll show you on our demo what OpenTelemetry ads in this area, but what the Microsoft team said is, "Well, our API, our standard API for logging looks very, very close to the OpenTelemetry standard, so why don't we just tweak APIs so that it follows those standards and call it a day?" The names of the objects may not match exactly, but rather than have this huge renaming effort, let's just leverage what already exist out of the box for logging.

Tracing, it is a similar story. The system.diagnostics.diagnosticsource nugget package multi targets .NET framework and .NET standard, .NET 6, .NET 7. And so it's a cross platform out-of-band package that allows us to add distributed tracing. And on top of that, the tracing APIs that it's using have been built into the .NET framework since .NET 3 or 4 days. And so if you even have legacy ASP net applications, you can actually leverage OpenTelemetry because it's building on top of these existing objects.

The only caveat is that the names, again, don't match. So while OpenTelemetry calls things spans, in .NET they're called activity. And that again was because, rather than throw away all that existing work, it was decided to be an easier lift just to adapt the existing APIs to the OpenTelemetry APIs versus starting from scratch. Now, metrics is a little bit different. There were no standardized metrics libraries for .NETs. There were ones that kind of matched, but not really. And so for this one, they actually started adding additional APIs that actually did more closely match or match exactly the OpenTelemetry standards.

And this is also in the system.diagnostics.diagnosticssource package, which again, ships out-of-band from .NET Core and the .NET framework so you can leverage it on any of your new and existing systems today. And in terms of where these are in their release milestones, logging and tracing, from an OpenTelemetry perspective, R 1.0. From a .NET perspective, tracing is 1.0 and logging is still in beta. They do expect to ship the 1.0 version of logging this year.

And I'll be showing what is the exact 1.0 part of this, because microsoft.extensions.logging has been around for a while. These are just the extensions for reporting logging in a standardized format. And finally, metrics is shipping as part of that metrics library, however, reporting the metrics export them is still in an alpha state. So do expect some changes there.

All right, so I'm going to show three different demos today. First is just some OpenTelemetry basics and going through the different APIs that come out of those different three libraries that we saw earlier, I'll be showing how they shim into OpenTelemetry. I'll be showing an OpenTelemetry example with NServiceBus to show how we can extend NServiceBus to be able to use OpenTelemetry in some of those different standards. And then, finally, we're going to be showing a more complex real-world example versus the kind of simple example that we have with the second one.

So first let's look at a demo with OpenTelemetry basics. I'm going to switch over to Visual Studio and this is just a regular console application for.NET 6. So right now it doesn't do anything exciting. It does our poor man's logging, which is just console at right line. This is the poor man's version of logging, of just writing to the console. And so we'll be extending this to use OpenTelemetry and the system diagnostic pieces. So we'll need to pull in a few packages to help us here.

And the two packages I'll be using, first is the OpenTelemetry exporter console package. Now this does pull in some other additional packages here. So it'll pull in, for example, the OpenTelemetry API and some of the kind of building blocks. So if we go over here to packages, we can see that the OpenTelemetry exporter console package pulls in the OpenTelemetry package, which pulls in the OpenTelemetry API, and a few other packages, including logging.

I'm pulling in the logging package explicitly, because there's some APIs in there I wanted to use directly, but otherwise this could be just the thing that you use to pull in the other ones that we need. Way down here, too, we can also see that this pulls in system diagnostics, that diagnostic source. That includes the tracing and metrics APIs that we want to target here.

So going back up to program CS let's first pull in some logging pieces. The first thing I'm going to do is, whenever we're reporting this information to some kind of centralized store, we do want to identify our service versus others. So I'm going to just create a constant here, which is our service name, and this is going to be passed into the different OpenTelemetry extensions to identify this service versus some other service. So when we're looking at it through a UI, I can see like, "Oh, this is coming from this application versus the other application."

All right. So first let's get some logging. All right. So yes, we want to pull in that. All right. So what do we pull in here? The first thing we pulled in was some initialization for microsoft.extensions.logging. And so here, we're just using the static factory method to create a logger factory. And inside of here, we'll do the configuration for OpenTelemetry. In this case, I'm not pulling in all the regular defaults from a web or worker application. I'm just creating the logging from scratch so that we can see what it looks like without using any of the sort of built in defaults from the logger.

So inside of here, in builder, I'm saying add OpenTelemetry. And so what this will be doing is adding the OpenTelemetry standards for enriching logging information. And we'll also be adding the console exporter, and this will be exporting the data in an OpenTelemetry-specific format. Finally, I will create my logger and this thing down here, this logger is the standard sort of Microsoft extensions logging log that we have. So your application code itself has no connection to OpenTelemetry itself. It's just using this standardized interface for logging.

So now when I run my application, we see a lot more information than what you would normally see in sort of your standard logging formats. We have many different things now being spit out here. We've got a trace ID, a span ID. We've got a timestamp here, the event ID. And then finally down here, we have the actual message that's being pushed out. Hello from USA. It's the USA format of month, day, year, and the time where it is for me right now. Now, the reason why it's showing in this format is because this is the format standard from OpenTelemetry for logging information.

Now, typically I wouldn't just output this to the console. I would output it somewhere else. I'll output it to Splunk or output it to AppInsights, something like that. And so that piece is the PR that is still on beta is the exporting part of logging. So there's just sort of two, two standards working here. One is the standard of the API of actually performing the logging. And that is the ILogger standard here. The other piece is actually reporting and exporting the logs to some central server. That's the piece that you see here, which has these expanded out information here.

And so if you're looking at AppInsights, this would make its way into different attributes and the log data there. Now, right now, we only have the console exporter in place. So the expectation is for additional providers to add their specific things, to say add AppInsights exporter, add whatever exporter, but for me, unless I really wanted to get to the OpenTelemetry standard for those different attributes, then I wouldn't necessarily use the OpenTelemetry way of exporting out logs. I may just use the ILogger SDK, which conforms to the OpenTelemetry recording standards, and not worry about the exporting standards.

So in this case, you still could use the sort of standard libraries that you can extend with, with logging. So you can still use, for example, like Serilog or... I guess that's a big one people use these days. I don't know if many people use Log4net. Typically, what our teams use is Serilog as the sort of standard there. And then you can still have a specific exporter if you don't want to use... You can still use OpenTelemetry exporting if you don't want to use the Serilog-based exporting. But because Serilog does support a ton of different exporters, I'll likely still use that until the OpenTelemetry piece has some more support there.

All right. So that was logging. Now let's look at the second pillar of observability, which is tracing. Now, this is an API that has been around for quite a while in .NET but has undergone pretty significant overhauls since the .NET Core 3.X days. So some of the base APIs are still very similar, but in terms of recording activity information, recording tracing information, the previous APIs were pretty horrible to deal with and so they saw the changes coming down the pipe with OpenTelemetry and adapted the activity APIs, the tracing APIs to that. I keep saying activity because that is the API for system diagnostics, it's called the Activity, which corresponds to a span in the OpenTelemetry world.

All right, so this is our activity piece or the tracing piece. So there's two pieces that I have to set up. The first is creating a source to report my different spans or activities. So I might have some code deep down in my application that wants to record a specific span of work. And, as part of the span, you can kind of think of it as a little micro benchmark and that it's recording a start and stop time and some interesting information for that activity.

And each of these activity spans will be around any time I have any kind of CPU or IO intensive work that I want to capture and report as part of an overall trace or span. So with an activity source, that gives me kind of a single point in my application in which I could report these activities versus here in logging, you just get this instance of a logger from this logger factory to be able to record a common set of logging, logging details.

The next part we want to do is define some way to record those activity information. So when I start and stop my activities or start and stop my spans, nothing will typically happen with those unless something is listening to it. And the idea here is that we don't want to add overhead to our production systems by having every single operation always being recorded somewhere. And so this one provides us a way, through OpenTelemetry, of defining a way to sample our different activities. And so this one saying always on. We'll sample every single activity.

We want to tell it the source of the activity we want to listen to. And so we will add the one that we're interested in, which is the same name as the service. And so the last one is we want to export this information somewhere. And so that is done with this ad console exporter. We'll look at different exporters in a more complex example, but for now, we're just going to be exporting our activities through the console.

So this would typically be in your program CS or startup CS, where I have my central configuration for logging, and this would be then for my tracing. Oops. Now that I have a way to sample activities, I'm telling it which activities I want to sample. And then I'm telling it where to export those activities. The final piece then is to actually define some activities and record some information. That's this piece here. So here, what I'm doing is I'm using my activity source to start an activity. And what happens behind the scenes is that this will create an activity object. It will check to see if the activity is being listened to. And if the activity is not being listened to this actually will turn null. So it won't incur overhead in the case that nobody is listening to this.

I remember having to do this back in the day of writing my own logging providers, that I would not perform the logging activities if no one was listening to the different logging levels. Logging libraries do this free you today. And this is a similar kind of thing where, if no one's listening to this activity, then it won't create it and this won't actually then return anything and these won't then do anything behind the scenes.

I'm adding some interesting information for it. In this case, I'm adding tags, and tags are just key value pairs of data that it can attach to an activity so that, when the activity stops, I can then look at those tags to be able to report them out to whatever I want to. So let's run this, let's see what we see. Oh, where did it go? I think it was there and it went away. Let's try that again. All right. What I do is I put a read key here. There we go. All right.

So the first thing that happened was still the logging that we saw earlier. And then the next part is this activity information. It looks kind of similar as we had before, as the logging. I have a logging trace ID. I have the logging span ID. And this trace ID represents an entire trace of different activities that can happen. And then the span ID represents this one individual activity in the context of this entire trace.

I have the display name of my activity. That's this display name here. I also have the individual tags that I put on part of it. So I have these foo tag, the bar tag, and the baz tag that were these different tags I put on top of my activity. One thing that this gives me that the log record did not give me is a duration. So it tells me that this one took six and a half seconds, I guess, because I had hit the key earlier. So it records a duration for when the activity starts and when the activity stops. And this is important if I have an activity calling out to some other network service, calling out to a database, calling out to an API or sending a message with NServiceBus, it'll tell me how long it took to do that specific operation.

Now, things start to get even more interesting if I then have multiple activities. So let's do something where we create a ChildActivity. ChildActivity. And then we'll set the same tags we had before, maybe changing some of the values, hello, and service bus. And then this is now four, five, six. And the name is now SayHello child. And I will also include some additional logging information inside after each of these different ones. It's NServiceBus Time. And now it's SayHello Child Time.

Okay. So now when we run this, we should see some additional information around the activities I created as the children. Now, one of the things, here we go, log three key, there we go. We have the activity information. So this is the first activity that we had. And this is the second activity. One thing you'll notice is that the trace ID is actually the same between both of these. That's how I can correlate those two activities together as some broader activity. And then finally the span IDs are different. So this one has a span ID of 210, and this one has one of 91BB blah, blah, blah.

One thing you may notice here is that the second activity was actually recorded before the first one. And that's because it's disposed and called stop on the second activity first, because it was sort of inside the using of the other one. And so that one got recorded first, and then the outer one then got recorded second. And the final piece I want to show is the child activity, this child activity, also has a parent span ID and so it knows that this span, when this one gets recorded, will also include the parent ID, which is down here as the span ID of this parent activity.

So you can kind of see that this will build up, whenever these get recorded, that is, to something listening to these, and we're storing them on a database, now has a graph of the different spans' activities and how they then all relate to each other.

The other piece I wanted to show is that by attaching logs to, or logging inside each of these different activities, I also now get included the trace ID and the span ID of whatever that was doing. And so this is the OpenTelemetry part of recording those log records is, as part of recording the logs, it will then see, am I part of any given running activity? And then, if I am, then go ahead and record the trace ID and the span ID so that my logging, my log storage, will use those IDs to correlate all those different things together.

Something like AppInsights does all this for you, but in its own proprietary way. So by using OpenTelemetry and the standard diagnostics APIs, we don't have to be tied to a specific provider implementation that maybe only that one does it, but in this case now I'm just using the just regular ILogger and regular activities. I get this kind of correlation stuff for free.

All right. So the last piece I want to show is, then, metrics. Now, metrics are now part of the system that diagnostic source API. So you can use this today, but those metrics have to be fed into something. And so that feeding into "something" is the alpha part of metrics. So the API looks pretty similar to tracing. I'm going to just remove that console read key, and this is metrics. So the way metrics works, we have a meter and this meter comes from the system diagnostics dot... Oh gosh, metrics names face of the diagnostic source package. And so we first define a meter and this meter then can define multiple metrics or counters. There's a lot of different things that can actually define as part of this, my meter or dot. I can create a counter, a histogram. Those are the two main ones that we have today.

So now I'm going to create a counter that just will record interesting metrics for counting fruit. The next thing I need to define is a way to record this counter information. The metric API by itself doesn't have any way to export these out themselves. You would have to add that yourself, so that's what's OpenTelemetry provides down here, is I first say I want to create a metric meter provider builder. I want to add this specific meter. And so I'm saying I want to add this meter over here. And you notice I don't have to provide the instance of it, just the name, because behind the scenes, this will add it to a static registry of different meters. So you don't have to reference the instance, which is good if you're leveraging someone else's API that defines some meter way down in its internals. And it says, just leverage the name of the meter, and you don't have to have the instance, and this will then find that global registry and pull out the meter from there.

Next thing it'll do is add a console exporter for meter-related events. So then, finally, when we want to record metric information, we will interact with our fruit counter, add different fruit counter events, and then those will be exported to our console. Let's go ahead and run this. All right. So up here, where is our... I did not see anything show up. Let's set a read key down here to make sure we don't end too early. There it is, the whole activity things. I should just use a using block as opposed to using statement. Those get confusing to me. So here we are, these are the three FruitCounters that got exported out. So we have the timestamp of when that occurred. It'll tell me the value of that as well as any other different relevant information that have passed in as arguments to that counter.

And, yes. So the piece that is OpenTelemetry-specific here, again, is this component right here, but the recording of that information, and then setting up the sources and actually utilizing those sources, that is not OpenTelemetry-specific and so that has been baked into many different frameworks we have today. The SQL Server clients, ASP.NET, ASP.NET Core, HD clients, entity framework. They now have these hooks built into them today so we don't have to do anything to be able to record that information. We just need something now to listen to it and then push it out to something interesting.

All right. So let's look at a more complex example. And now I'll switch over to an application that's using NServiceBus to show how we can leverage OpenTelemetry inside of an application that uses NServiceBus. So close everything. All right. So this is a slightly more complex application. I've got a web application that is sending messages to other downstream services. Those downstream services will just basically log some information and then output some information on the screen. They will do request response and pub/sub, all behind the scenes. So it all kicks off in our controller.

Our controller will just, in this case, I have two different controller actions. It receives a message, it creates an NServiceBus command, and then uses an NServiceBus message session to send that message down over to NServiceBus. The other one does something a little bit different. This one sends a command and this one publishes an event. So I'll be showing the differences between these two different kinds of interactions.

Now, in terms of the dependencies I'm pulling in, there's a few additional packages I have here. First of all, it's pulling in the NServiceBus exemptions, that diagnostics package. And so what this does is add an NServiceBus feature to propagate those headers on messages out the door, and then also parse the W3C headers on incoming messages as well. It creates activity sources for incoming and ongoing messages and makes sure that the messages, the activities themselves, conform to the OpenTelemetry standards for what information should be recorded.

So that's all with this. Because again, that has nothing to do with OpenTelemetry. It's really just the activities' API. The OpenTelemetry package is also, and now this one actually references OpenTelemetry, but it's a very, very small thing that it's doing. It's just adding an activity source with the name that it defined over here. So it's like one line of code that this is in encapsulating, not really much at all to do with OpenTelemetry, but just a convenience method, really.

I have the normal NServiceBus extensions hosting. I'm using the JSON way of serializing and I'm using RabbitMQ as my transport. And then the next pieces are OpenTelemetry-specific. So OpenTelemetry, I have to define a few things. One is I need to be able to gather the different sources of tracing information, and then I need to export them out to something else. And then this last piece, the hosting extension, it's just a way to register the OpenTelemetry-specific dependencies, as well as the background service that is going to performing the recording and exporting of the instrumentation.

I am pulling in some specific instrumentation libraries. Every kind of relevant thing that you want to record will have a specific instrumentation library. So these components are listening to activity source events, and then pushing out those information over to these exporters. Now, when I want to actually see that information somewhere else, then these exporters push the recorded instrumentation out to some other provider console in Jaeger and Zipkin.

And so each of my different services, I pulled those in I've pulled in... Now here's the library that raises diagnostic source events. And then this one will bridge the gap to OpenTelemetry. And then, again, I will provide exporters for the different things I want to export to. The same thing over here. I am pulling in, in this case, HTTP instrumentation, this one calls a web API, via HTTP client. So I want to record any kind of web API call that I'm making. So let's just run the web application.

This is just a standard .NET Core, .NET 6 space web application. And it starts out getting my Swagger UI. To configure all this, when I go to my program CS, I have the normal NServiceBus configuration here with all the kind of NServiceBus-y things that y'all are used to. I am adding one specific thing for our diagnostic source settings. By default, it does not record the message body, for security purposes, but for local debugging, I am saying, "Go ahead and capture that message body." This is something you want to decide on a case-by-case basis. Do you want to record the message body? In this case I do, for demo purposes, but you'll want to decide for yourself what you want to do there.

Then in the startup is where I'm configuring OpenTelemetry. So the OpenTelemetry extensions hosting packages gives us this add OpenTelemetry tracing method that was different than the method we saw earlier. There was like SDK dot something. This one is now tagging itself directly on top of the AddService collection interface here. We give it a resource builder, which is just a way for us to name what OpenTelemetry resource this application is running under. And so I just tell it my endpoint name is going to be that. I tell the things I want to instrument. So I'm telling it to add ASP.NET Core instrumentation, SQL client instrumentation, and then NServiceBus instrumentation. Each of these just hook up activity source listeners behind the scenes. And so this one is just a one-line thing to say, "Listen to this activity source and record it."

Next, I'm doing the exporting. I'm configuring both Zipkin and Jaeger as exporters for OpenTelemetry. So these libraries here have no knowledge of the exporters. They only talk in terms of the system diagnostics APIs. So in the past, when I was having to create open AppInsights specific extensions, or AppInsights, listening for NServiceBus, I would have to create packages, or reference packages, that were specific for AppInsights and NServiceBus. With this, I don't have to worry about someone developing some specific extension for the tracing or logging provider that I'm interested in. Now it's just recording them through a standard API and these exporters now don't care about what the sources are. They just care about I can gather those common sources and push them out to these different providers.

So go back to my Swagger UI and I'll go ahead and execute this. So I'll just execute that. All right. So I just have the web application running. That sent a message. So let's go over to RabbitMQ, and we should see up, yep, we see a message sitting there. So I'm going to look at this message and see. There we go. Make it a little bit bigger. So let's go ahead and get the messages and see what the NServiceBus extension added on top of it.

So in this, you will see the kind of normal NServiceBus headers. Those are unchanged. What was added on here is a trace parent identifier. So this is the W3C standard for trace information for diagnostics. So all we're doing is just adding on top of this the W3C standard for the trace parent header.

So what I'll go ahead and do is stop the application. All right. Stop. And now I'll run the whole thing. So we have multiple different messages going through. I've got the web API running and I've got two background services running, as well. We already see down here that it's actually got one of the messages it's received as part of that, so let's go over to... Yep, it's received some silly messages coming back, setting databases and stuff, all sorts of fun stuff.

So let's go over to Zipkin and see if some traces showed up. Aha. So we see one over here. It says "There's nsbactivities.workerservice had a send activity a few seconds ago, and noticed it took a few seconds. And this is because I stopped the application, only just running the one, and then I restarted the application. So it took a little while for all of those different messages to finally get propagated.

All right. So looking at this, we can see there's the initial send right here. And then all the different information that we saw from the console are also showing up now as tags in Zipkin. So I've got the conversation ID, the destination, where this should be sent. These right here are the recommended standards from OpenTelemetry. It's defined as part of messaging-based activities and spans. These are the things that you should record.

On top of that, I went ahead and tacked on, as part of that package, all the NServiceBus headers that we can pull out, as well. So not just the standard OpenTelemetry ones, but all the NServiceBus space ones as well. So looking at what happened here, this worker service sent a message to this other process. This worker service then processed the message and then made an API call to the web application, then called into a database. This one sent another message that was processed over here, called this other process, called some Rabbit, this is MongoDB stuff. So there's a whole bunch of stuff. And then finally at the end, at the very end of the process, it replies back to the original message and the web application finally at the end, it says, "Okay, I replied back and here's my message payload that we saw in the console application itself."

So way over here, "Received back at you and this," and it's blah, blah, blah. So that was the final message payload that we received back in the overall UI. Jaeger, we should see something very similar. So going back and looking at the web application and find some traces. We see, here's that initial SaySomething. So that was the initial one. And then we had the WorkerService.send, and normally these are attached. Let me run one that's in process because, by stopping it, I didn't get the full trace coming through. Execute. All right. In our console. We got a message there and our second message here.

And then if I go back to Jaeger UI and search the web application again, we should see most recently, aha, here we go, our SaySomething. And this is the overall trace from the web application, sending down into a bunch of other processes. And then finally, at the very end, the overall reply got back over here. So this went through several queues, several processes, making API calls, calling into databases, having one single view of that entire trace throughout the entire request flow here. All right.

Now, one thing I want to of just add real quick is it can be kind of difficult in these traces to find something specific that happened. You can go look at these tags and say, "Oh, well, I had this specific target," or whatever it might be, or this specific process, but it can be hard to really nail down the individual request for something happened. So one of the nice things about those activity APIs is that they can be extended to any kind of user-specific information that you want.

So I'm going to actually go over here to my controller and add some customized information to my traces so that I can then search on that inside of my tracing utility. So right here and right here, I'm going to add some additional information to the currently executing activity. And the way we can do this in ASP.NET Core is through the features concept of ASP.NET Core to get the IHTTP activity feature. And what this will do is get us access to the currently executing activity for this specific request.

And I'm going to be modifying that activity to add some additional information. So what I'm doing here is adding some baggage. Now, baggage is interesting in that baggage actually will propagate throughout any subsequent activity and request. So this information, this operation ID, will actually be included in outgoing messages, outgoing HTTP, calls as an additional header called baggage, included what that original trace header. This is an additional standard from the W3C called the baggage standard. And then what this lets me do is propagate any additional context information for all these different traces.

So now when I run this, all right, so I'm going to say, "Hello hard to trace." And I'm going to execute this a bunch of times. Let's execute, execute, execute, execute, execute, execute, execute, execute. Okay. So now I put my application under serious click load, and then when I go back to search for traces, and if I go look for, again, the web application service, and I say, "Find traces," well, I'm in trouble now because there's a bunch of different traces that I'm looking at. So how do I find the one that's specific to mine?

Well, if I go look at one of these, like SaySomething. Let's see. That one was most recent. Yep. This one's good. So a few seconds ago. If I go look at, now any one of these different operations, I'll actually see the operation ID show up as part of the request way down here. So even though I set the activities baggage up here at the web application, that information is actually getting propagated down all the way down to all of the child spans.

And what's great about this is you can plug this information down to your logging utilities so that I can include this enriched information to say, if I want to pull out all the logs for this specific tag or this specific operation, which could be the logged-in user, it could be the machine, it could be a specific user ID, that could then be tagged as part of these different spans. So you can quickly nail down someone coming in with a bug, the exact set of requests that went into whatever that was.

So back over here, let me go back over to Swagger UI. And down here, I can see that the response for this was the operation ID. So I can take this and say, "I want to just find," and the tags, I want to say that "The operation.id equals this. Find the traces." Oops. I think I need quotes around this. Okay. Well, this worked before, but it's not working now. But that's fine. The idea is that we can now leverage, leverage our search here to really nail down the specific one we're looking for, as opposed to just having this sea of traces that make it quite difficult to find, which is the one we're we're looking for here.

So you can do tag, operation.id equals this. All right. Yeah. And those are all part of that same, at least the first one as at least. All right. So now want to quickly show... Actually, it looks like we have just five minutes left, so I'll pause and we'll go to some questions. I did have a more complex example that was a really complex set of sagas, but this one did use a set of saga so it hopefully gets the idea across here.

So just looking at some questions here, we have some questions around does it support different providers? Right now I'm just showing Zipkin and Jaeger. There are numerous different providers and exporters for OpenTelemetry. I've just showed those but if you go to the OpenTelemetry website, it's got a ton of different ones, besides just Zipkin and Jaeger. I picked Zipkin and Jaeger because I run them on my own machine, but there are definitely other ones that are supported out there.

We had some questions around what happens if the code throws an exception? What do the logs and traces look like? The underlying activity sources know how to catch exceptions and then report the exception as an exception in a trace. So let's go over to here and just say, "Throw new exception. Blammo." All right. And so when we see this trace, it will just have one at the bottom that'll just be, it's in an exception state. And so the activities themselves have different statuses and one of the statuses is error.

I don't actually remember which one of these it is so I'll just try this one. It's going through errors here. So this goes through retries so in our trace, we'll see retries show up. On query and start time. Here we go. Yes. So down here is where we can see that it's processing the message and somewhere in here, we should see... It's not in here. It'll be in the above one. After it goes through second-level retries, we'll see it show up as retrying the message over and over again.

So that's something that's actually built into the OpenTelemetry and activity APIs, is the ability to record any kind of exceptions or that the operation did not complete successfully. All right. Let's see. What else do we have? A question was, "Why would you use HB context features currents?" Let's go back over to our visual studio and go ahead and stop this. So the question was, "Why would we use this instead of using activity.current?" And the big reason is you're not guaranteed that the activity.current. Activity.currrent. We don't know if this activity is the same as this activity. So if you're intending to change and modify the activity that represents the incoming web request, this is how you absolutely know that this is the activity that corresponds to the web request.

Because anything could start activities and that modifies activity.currents. This guarantees that we're modify the activity that we intend to and not some other kind of random activity that could be going on there. The NServiceBus diagnostics package has a similar thing that you can get the NServiceBus activity versus some other activity that may be executing. And we especially saw this when we had infrastructure code calling out to SQL. I was modifying the SQL activity, not the NServiceBus activity, so the information looked wonky, it wasn't lining up correctly. And so that guarantees you're getting the correct activity for that feature.

What happens if the exporters aren't up, over here? So what happens if Zipkin or Yaeger are not up at that time? That's all done behind the scenes for you. It has an internal buffer for pushing those out, but if it can't communicate to those external providers, it just won't report those spans. So it won't bring your application down. It's just you'll have missing spans in your different providers there.

"Is there any way not to lose the data?" Not really. Most times, people don't want to store their trace information somewhere else temporarily, and just deploy these in a highly available format. So just treat that as you would like your other database and things like that.

One is, "How do you trace the current authenticated user?" Over here in our controller, if I wanted to add information about who was currently logged in, then I would just go into, what was it? Current identity. Httpcontext.identiy. Or, no. I don't even remember what it's called in here. Request. Oh, user. There it is. Yeah. So just add user information to your activity, however you want to, either through baggage or tags, and now you've got the current user information tacked on there. That was kind of the point of seeing this is you can add tags and baggage and add whatever information that is important to you as part of that.

Last question I've got before we stop for the day is, "If we use Azure functions for hosting endpoints, is there anything different we need to do in terms of setting up OpenTelemetry?" I don't believe so, because this is really just, once I've referenced the package here, that is the NServiceBus diagnostics, this package right here, that feature is enabled by default so it will get added as part of the normal kind of NServiceBus startup. Actually, using OpenTelemetry with this sort of tracing with Azure functions, that I don't know. The OpenTelemetry docs though, may have some more information specifically about how to hook up this to Azure functions. I think they already have a hook for adding surfaces.blog junk to it, but we can look that up.

All right. So one last little caveat with baggage. Baggage does propagate through every subsequent span. So this is good if you want downstream logs to also include this information. But if the information is only relevant to this one specific span or action, put it on a tag, not baggage. Because this will get tacked on to all subsequent HTP requests and messages as well. So we don't want that to just be the dumping ground, because there's a reason why it's called baggage, because it does travel along every subsequent request. All right. So thank you all very much. That's all I had today. Thank you for attending.

00:59:14 Ramon Smits

Yeah, Jimmy. Thanks. We also had a couple of questions earlier that we received by email. I'm not sure if we processed those questions already during the presentation.

Oh, yes. Yes. So one of them was around, you're already using X-ray to track performance. So, for something like that, you would just look to see if there's an X-ray OpenTelemetry provider. Some of those, you basically just Google, "AWS X-ray OpenTelemetry." I found that almost all the ones that I've seen do have that available for you. Basically, Google way for you to see, ""Can you do dot add X-ray exporter," right here.

The second one is, I think that we already answered this one, was "Distributed tracing with authenticated user info." Yes, you can do that. There's nothing that ties in user info directly with OpenTelemetry or, not even OpenTelemetry. This is not OpenTelemetry. This is the diagnostics API. So it's up to you what you want to stuff into the activity, what you want to put in tags, what do you want to put in the baggage? The information that shows up right now are the recommended standards from the OpenTelemetry Project. But that's just the starting point. You put whatever you want in there using these kinds of extensions to get access to the current activity, or for logs it's the same thing. We use logging scopes and serial log. We put the baggage on top of the serial log scopes, so that those show up in serial log as well.

The second one is, "Passing all OpenTelemetry attributes along a messaging environment." You typically don't do that. The OpenTelemetry attributes are only for exporting out. They're not for passing along for message to message. The things that travel across the network are the W3C standard attributes, not OpenTelemetry attributes. And so the W3C standard are for trace context and for baggage. Those define the correlation IDs that will propagate along, and then also defines the baggage that also gets passed along. But the intent is not to dump all the OpenTelemetry things on there. That is meant to be exported out as part of any one individual process.

01:01:50 Ramon Smits

Okay, great. Well, thanks, Jimmy, for your really nice presentation on OpenTelemetry and with using it with NServiceBus. That's all what we have for you, everyone, today. On behalf of Jimmy, this is Ramon saying goodbye for now and see you in the next particular live webinar. Thanks everyone for attending.

Using OpenTelemetry with NServiceBus

🔗Why watch?

🔗In this webinar you’ll learn about:

🔗Transcription

About Jimmy Bogard

Additional resources