Fallacy #2: Latency is zero
This post is part of the Fallacies of Distributed Computing series.
The speed of light is actually quite slow. Light emitted from the sun this very instant will not reach us here on Earth for 8.3 minutes. It takes a full 5.5 hours for sunlight to reach Pluto and 4.24 years to reach our closest neighboring star, Proxima Centauri. And we cannot communicate at the speed of light; we must bounce data around between Ethernet switches, slowing things down considerably.
Meanwhile, human expectations of speed are pretty demanding. A 1993 Nielsen usability study found that, for web browsing, a 100 millisecond delay was perceived as reacting instantly, and a one second delay corresponded to uninterrupted flow. Anything more than that is considered a distraction.
What is latency?
Latency is the inherent delay present in any communication, regardless of the size of the transmission. The total time for a communication will be
TotalTime = Latency + (Size / Bandwidth)
While bandwidth is limited by infrastructure, latency is primarily bounded by geography and the speed of light. Geography we can control. The speed of light we cannot.
This latency occurs in every communication across a network. This includes, of course, users connecting to our web server, but it also governs the communications our web server must make to respond to that request. Web service calls, or even requests to the database, are all affected by latency. Frequently, this means that making many requests for small items can be drastically slower than requesting one item of the combined size.
Ignore at your own risk
Latency is something that should always be considered when developing applications. To neglect it can have disastrous consequences.
One of our developers once worked with a client that was building a system that dealt with car insurance. In the client’s specific country, automobile insurance coverage was required by law. The software needed to check if drivers had car insurance already. If they did not, they were automatically dumped into an expensive government-provided insurance pool.
The team decoupled this dependence on an external insurance check by stubbing it behind a web service. During development, this web service simply returned true or false immediately. Unbeknownst to the team, the production system used a dial-up modem to talk to the government system. Clearly, this was not going to work at scale.
This is an extreme example, but it illustrates the point. The time to cross the network in one direction can be small for a LAN, but for a WAN or the Internet, it can be very large — many times slower than in-memory access.
Careful with objects
In earlier days of object-oriented programming, there was a brief period when remote objects were fashionable. In this programming style, a local in-memory reference would be a proxy object for the “real” version on a server somewhere else. This would sometimes mean that accessing a single property on the proxy object would result in a network round trip.
Now, of course, we use data transfer objects (DTOs) that pack all of an object’s data into one container, shipping it across the network all at once and thus eliminating multiple round trips associated with remote objects. This way, the latency penalty only has to be paid once, rather than on each property access.
So, it’s important to be careful of how your data objects are implemented, especially those generated by O/RM tools. Because of object-oriented abstraction, it’s not always easy to know if a property will just access local data already in memory or if it will require a costly network round trip to retrieve.
SELECT N+1 queries are just one specific example of the second fallacy of distributed computing rearing its ugly head.
To combat latency, we can first optimize the geography so that the distance to be crossed is as short as possible. Once we have committed to making the round trip and paying the latency penalty, we can optimize to get as much out of it as possible.
Accounting for geography
For minimum latency, communicating servers should be as close together as possible. Shorter overall network round trips with fewer hops will reduce the effect of latency within our applications.
One strategy specifically for web content involves the use of a content delivery network (CDN) to bring resources closer to the client. This is especially useful with resources that don’t often change, such as images, videos, and other high-bandwidth items.
Latency typically isn’t much of a problem within an on-premises data center with gigabit Ethernet connections. But when designing for the cloud, special attention should be paid to having our cloud resources deployed to the same availability zone. If we are to fail over to a secondary availability zone, all of the related resources should fail over together so that we do not encounter a situation in which an application server in one zone is forced to communicate with a database in another.
Go big or don’t go
After accounting for geography, the best strategy to avoid latency involves optimizing how and when those communications take place.
When you’re forced to cross the network, it’s advisable to take all the data you might need with you. Clearly this is a double-edged sword because downloading 100 MB of data when you need 5 KB isn’t a good solution, either. At the very least, inter-object “chit-chat” should not need to cross the network. Ultimately, though, experience is required to analyze the use for the data. Access to that data can then be optimized to balance the need for small download size and minimal amount of round trips. This is related to the 3rd fallacy, bandwidth is infinite, which will be covered next.
However, a better strategy is to remove the need to cross the network in the first place. The latency delay of a request you don’t have to make is zero.
One obvious strategy is to utilize in-memory caching so that the latency cost is paid by the first request. This will be to the benefit of all those that come afterward; they can share the same saved response. But of course, caching isn’t always a possibility, and it introduces its own problems. As it has been said, “There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors.” Knowing when to invalidate a cached item, at least without asking the source and negating the benefit of the cache, is a hard thing to do.
Harness the power of events
With asynchronous messaging, you can publish events about information updates immediately, when they happen. Other systems that are interested in this data can then subscribe to receive these events and cache the data locally. This ensures that when the data is needed, it can be provided without requiring a network round trip.
Not every scenario requires this additional complexity, but when used appropriately, it can be very powerful. Instead of having to wait to request updated data, always-up-to-date data can be served instantly.
We can use a variety of strategies to minimize the number of times we must cross the network. We can carefully architect our system to minimize differences in geography, and we can optimize our network communications so that we return all the information that might be needed, minimizing cross-network chit-chat. This requires experience in carefully analyzing system use patterns in order to optimize how information is delivered between systems.
We can also use a variety of caching strategies, including in-memory cache and content delivery networks, to minimize repeated requests for information. CDNs in particular are useful for moving content “closer” to the end user and minimizing the latency for those specific items.
Publishing events when data changes can be instrumental in managing a distributed infrastructure. Being notified of changes can help disconnected systems remain in sync.
The speed of light may be slow, but it doesn’t have to keep you down.