The day we almost lost an invoice

Lost invoice

The day started like any other day working as a solutions architect for a healthcare invoice processing system. When I came into the office I didn't anticipate that this was going to be a special day. I only realized it when a customer called in and asked for a specific invoice that should have been processed in our system. We couldn't find it! It just wasn't there at all. Sweat was dripping down my neck. Did we just lose an invoice -- and therefore money?! We started analyzing our system, introducing more diagnostic information, and asked our customer to get raw data from his invoice of his customers so that we could process it again on our staging system. I knew this was going to be a long day. After digging deeper into our system and also analyzing the customer's invoice, we found out that the invoice payload contained a large amount of binary encoded data -- such a large amount, in fact, that our infrastructure couldn't handle it.

Message overflow

Hitting the limits of your underlying messaging infrastructure, like reaching maximum message size, is not a good position to be in. The moment you reach the message size limit, your queuing system stops processing the messages and even rejects them. Furthermore you are wasting important system resources. In our case, the consequences of hitting these limits were that we couldn't reliably process invoices anymore. Let me walk you through how we ended up in this situation and what solution we came up with so that you can learn from my mistakes.

A failed approach

The invoice processing system was part of a larger enterprise architecture. We had been using messaging as a defacto standard to break apart the monolithic invoice processing system and to exchange commands and events between the new sub-systems. An important part of the invoice processing system was the ability to receive invoices in raw format and process them according to a complex set of business rules. In Switzerland there is a defacto standard for medical invoices provided by a group called forum-datenaustausch. The standard defines an XML schema for invoices. Because we were already using messages with XML payloads, we decided to use the standardized XML format provided by forum-datenaustausch directly in the body of the message. We introduced a command called SubmitInvoice that basically looked like this:

<submitInvoice>
	<invoiceId />
	<invoicePayloadDefinedByForumDatenaustausch />
</submitInvoice>

When we designed the system, we also did rough calculations for capacity planning of our queuing system. In order to do that, we used the sample data provided by the standard body. The payloads were usually around 100 KB, and with a maximum of 1 MB. At the time we were using MSMQ as a transport layer and we knew that the message size could be a maximum of 4 MB. So everything seemed to be fine. But one day we rediscovered an important detail in the invoice XML definition that we had completely forgotten, and we hit the maximum message size limit of the queuing system.

The request format allows the embedding of documents into the invoice in raw format. The complex type defined by the schema looks like this:

<complexType name="documentType">
	<choice>
		<element name="base64" type="base64Binary"/>
		<element name="url" type="anyURI"/>
	</choice>
	...
</complexType>

So a document type can be either pointing to a remote location or it can simply be a binary base64 encoded blob. That blob is not limited in size and it can contain anything (even X-Ray images, which can be several hundreds of MB). Even worse, the documentType is part of a larger collection that looks like this:

<complexType name="documentsType">
	<sequence>
		<element name="document" type="invoice:documentType" minOccurs="1" maxOccurs="unbounded"/>
	</sequence>
	...
</complexType>

To put it simply, an invoice can contain an unbounded collection of base64 encoded documents. We really should have done our homework! We looked for a way to increase the maximum message size in MSMQ but quickly found out that 4 MB was a hard limit. Then we started investigating other transports, like ActiveMQ. ActiveMQ had a standard message size setting of 32 MB and, according to the documentation, it was possible to raise that limit to a higher value. That looked promising! After further investigation we found that setting the maximum message size to a value larger than 32 MB had a huge impact on performance. This is because, by default, ActiveMQ chunks the transaction log into 32 MB pieces. In the case of larger messages ActiveMQ would need to spread the message content over multiple transaction logs, which is less than ideal not only during runtime, but also when you need to recover the broker after a restart. So we went back to the drawing board.

Success at last

Enterprise Integration Patterns from Gregor Hophe and Bobby Woolf provided an answer for us. There is a pattern called claim check that addressed our exact scenario.

The Content Enricher tells us how we can deal with situations where our message is missing required data items. The Content Filter lets us remove uninteresting data items from a message. Sometimes, we want to remove fields only temporarily. For example, a message may contain a set of data items that may be needed later in the message flow, but that are not necessary for all intermediate processing steps. We may not want to carry all this information through each processing step because it may cause performance degradation and makes debugging harder because we carry so much extra data.

How can we reduce the data volume of message sent across the system without sacrificing information content?

read more

The basic idea is that you extract the large payload, put it into a highly available data store, and only pass the claim check to subsequent components. We decided to adapt the claim check pattern and change the SubmitInvoice command to the following:

<submitInvoice>
	<invoiceId />
	<uriToTheLocationOfTheRawInvoice />
</submitInvoice>

The command sender was changed first to upload the raw invoice payload into a kind of blob storage infrastructure. It would then simply send the SubmitInvoice command with the invoice id and the full URI to the invoice on the blob storage.

This drastically reduced the overall message size in the system and therefore also improved the performance and throughput in the messaging infrastructure. Large payloads no longer needed to be sent around in the system -- only their locations. This enabled subsequent components to stream the data directly from the blob storage.

What about your system? Do you know the limits of your infrastructure? Have you done your capacity planning with real data? Are you trying to send large payloads over the queuing system? If the answer to one of these questions is yes, then you should probably go back to the drawing board. Don't wait for problems to happen; learn from my mistakes before you lose your own customer's invoice, and think about the limits of your infrastructure beforehand. Always remember that not being able to reliably process messages means that, on a long enough timeline, you will lose messages. And losing messages often means losing money.


About the author: Daniel Marbach is a solution architect at Particular Software, Microsoft MVP for systems integration, coach and passionate blogger.