Skip to main content

Managing Microservices

Explore the real world difficulties encountered in testing, deploying and managing many small applications.

🔗Transcription

00:02 James Lewis
Hello. Okay. All right. Good afternoon, everyone. Let's start that again. So, my name is James Lewis. I work for a company called ThoughtWorks, not sure if anyone in the room has come across ThoughtWorks. We're sort of a international consultancy, sort of a boutique consultancy, I suppose. It's about 3000 of us globally, something like that, about 300 in the UK, a few more in Europe. I'm going to talk today about micro services. It's a hot topic, and some of you might have come across the kind of ideas already.
00:55 James Lewis
First of all, I'd like to thank Udi in particular, see what I did there, for inviting me today. So thanks very much.I really appreciate it. The interesting thing is, some of the conversations that led to these ideas crystallizing, actually happened four and a half or so years ago, at a workshop that I was actually at with Udi in Northern Italy. So, hopefully what I'm going to be talking about, isn't too dissimilar to the themes that you've been talking about today already. If it is then, sorry. I'm right.
01:36 James Lewis
Okay. So the talk is called managing microservices. In particular, the reason why I'm saying managing microservices is because microservices, we'll go through the characteristics of microservices in a moment, but basically there's a bunch of... You've got a bunch of small communicating services, however they communicate, you end up having to manage a lot more operations, you end up with lots more operational complexity, and you have to manage that in some ways. So what I'm going to be talking through for most of this talk, is techniques, practices that have been developed over the last couple of years, to help us manage the operational complexity of looking after lots of small systems talking to one another. So I guess, I'll get going.
02:22 James Lewis
Has anyone read the article that Martin Fowler and I published, a few months ago? So that's all Martin Fowler's side, caused a bit of a stir, and all sorts of activities happened afterwards. There is a conference coming up, I think it's called µCon, Microservice Con, in a couple of months. It's been a really interesting time, because it feels like the idea of microservices is almost like the time has come. It's an approach to building systems of systems, based out of, these small collaborating services.
02:56 James Lewis
I mentioned... I chatted with Udi, some four and a half years ago about this, the kind of etymology, where this concept came from really, is bringing together a bunch of better practices from lots of different disciplines. So from architecture, from integration patterns, from how to manage systems from the Cloud, all these different things coming together, in one kind of place, in one kind of architecture style, and them being called microservices. ThoughtWorks also publishes this thing called the Technology Radar, I won't do any more sales, promise, but it's coming out in July, the new one is July 2014. It's quite interesting. It's what the ThoughtWorks view on all sort of tools and practices and technologies and things is, at the moment.
03:43 James Lewis
So, today's talk, what I'm going to talk through, are some of the characteristics of microservices, first off. And then I'm going to talk about how we build microservices, because actually designing them and writing the code is one small part of this, but actually how we put these things together, how we actually get them from our desktops into production, it's that section. Then deploying them and managing microservices, actually once they've gone live. And then I'll be talking about giants at the end, including Udi, I'll stop using his name there.
04:18 James Lewis
So what's the story so far? So this is what... This is really where microservices came from. At this workshop, there's a whole bunch of people talking about how we're building software, that's just way too big. We're building these big monolithic web applications, or big monolithic applications, and they becoming really difficult to change over time. They're becoming really difficult to manage, becoming sort of these big balls of mud, of spaghetti carrot, really tangled. Difficult to get lots of people working on them, because they are just treading on each other's feet. Difficult to test them often, difficult to change them because if you make a change in one place, has ripple effects everywhere else.
04:58 James Lewis
And also we have this two to five-year rewrite cycle with these things. So now you build something, a few years later, you hit this wall where you can't make many changes anymore. It becomes far too difficult to make changes. So we have to rewrite whole systems, whole applications. And we're thinking, at this workshop, everyone was just saying, "We've got this big thing. We need to break it up. What's the best way of doing that?" So using things like strangler pattern, and all these different approaches.
05:25 James Lewis
And it occurred to a few of us then, that maybe there's something we're missing and that's the size, that's how big these things are getting, then maybe if... We started thinking about actually building smaller things and linking them together somehow, having them communicate with one another, and whether that's via Bus, whether that's via RESTful techniques or RESTful integration techniques, resource-oriented-architectural approaches, but the key thing is having small things that talk to one another. So that's really what microservices to me is really about. It's about taking these big applications, identifying what's the bounded content style within them, and splitting them out and crucially, as you can see up here, along with the data, so taking the data with them.
06:10 James Lewis
So for a long time, we've had this spider pattern, where you have a big database. We might have a service oriented architecture, but everything talks to the same database. That's not what we're talking about here, because that in itself is a brittle approach. We take the data with it. So we split off the bounded contexts. I'm getting smiles in the front here, and at the back. So I think this has maybe striking a cord with some people. Actually my current client, one of my current clients, is in exactly this position where they've got lots of services talking to one database, and they still can't make any changes. So we take the data with it. This is the service oriented approach where we talk about systems of record, single sources of truth, for data.
06:53 James Lewis
The first characteristic is this idea of these business capabilities that own their own data. And we've adopted this, or maybe borrowed, this term from Donald Alisa Coburn, who is actually pretty well-known in the software industry. A few years ago, he coined the term, several years ago now, hexagonal architectures. Now, when he was talking about hexagonal architectures, he was really describing how we could build systems that allowed us to beat... That allowed us to test them more easily. So we called them ports and adapters. So we have ports on the outside of our business capabilities, on the outside of our bounded contexts, they'll say, which relies to swapping different end points for testing purposes.
07:36 James Lewis
We've repurposed this as the idea of having these hexagons, that surround our bounded contexts, around our business capabilities. And they have these ports on the outside. And these ports should be very well understood mechanisms for integrating with these services, But crucially the data is with this thing. This is an example from a project about four years ago now. So this is a context map we came up with. It was for a retail bank, I guess, loyalty management system really, but you can think of it as a retail bank. It's got all the usual sorts of things you might expect at the bank. You've got some way of managing transactions and new accounts.
08:18 James Lewis
You've got some rules that you have to run reporting, access and entitlements and users and so on. So we separated out this big product into a bunch of these sort of capabilities, which is defined by these thick black lines. And then from that, we decomposed them further, into microservices. This is a key thing though. You need to start at the top. You need to start by understanding what the business looks like. This is a phrase that my colleague, Ian Cartwright coined, which is, "There should be business and architecture isomorphism." So you should be able to look at... A business person should be able to look at a high level map of the architecture in their organization, and actually see the business represented there.
09:02 James Lewis
And similarly, as technologists, we should be able to look at the business and see our architecture represented there. There should be this kind of isomorphic behavior, between the two, isomorphic relationship rather, between the two. So start at the top. We need to understand what business processes look like. We need to understand what the capabilities are macro level, come up with this town plan idea. And then we can drill down when we understand a bit more. And when we did drill down in this case. So this is the system that manage users, this capability that manage users. When we understood enough about the functional, cross functional requirements in this case, we were able to come up with a design for, within this capability, within this context.
09:42 James Lewis
And actually we had three services, and two different databases, within this capability. And this is just to manage users. This is how we create users in our system. The reason we have this was because we had some pretty strange non-functional or cross-functional requirements. We had to do some really high levels of batch loading overnight. And then during the day it was a fairly basic chronic access pattern. So we decoupled the user service itself from the outside world using an event queue, so we could... And computing consumers, which are the processes, so we could scale those up and down. But what we ended up with, if anyone has seen the Lego movie, is three microservices, microservice.
10:27 James Lewis
But the key point here is we understood from the top down first, this wasn't a bottom up approach. It's a top down approach. What's talked about ports and adapters... Well in the microservice, we'll be talking about sharing a uniform interface. Now this is a familiar term, if you've ever done design systems using the RESTful architectural style, but you needn't necessarily choose the rest of the architectural style. The point is there should be a uniform interface. The uniform interface in REST is one of the constraints when fielding its RESTful paper, REST paper. So what is the uniform interface that we use to communicate between these things?
11:06 James Lewis
Actually, in our case, what we chose, we were building a RESTful resource oriented system. So we decided to choose a particular uniform interface. And we chose this Atom and JSON representation, that was going to be common across all of our capabilities. So what we ended up doing is having one of our ports, if you like, the interface to this capability, and we'd have user requests, which you'd post a request to create a user. And then we'd have... And it was in a particular format, which is this Atom plus JSON format.
11:41 James Lewis
And the interesting thing here is, that if you start doing that, you can start to use these standard application protocols. So things like wrestle protocols or whatever, or messaging protocols, to start to bridge, what's called, the semantic gap. Now this is a big problem, always has been when we're doing integration in organizations. If you read this. This is a restatement of Noam Chomsky's famous, "Time flies like an arrow, fruit flies like a banana," statement. But the semantic gap is characterized by the difficulty that software agents have in distinguishing, in understanding language, human language. So understand the semantics of what's going on when we're actually communicating between systems.
12:24 James Lewis
Those two structurally are identical, but it's quite difficult to get software agents... And so therefore it's difficult to get software agents to understand what the semantic differences between them, what they are. So what we're talking about, as I said, the semantic gap characterizes the difference between two descriptions of an object in two different places. So an example would be, on a website, you might have a number of ways of creating a User Say. So you might be able to register a user, or you might be able to create a user. If I'm programming against that interface, I want to write a software agent that has a state machine that is able to transition through a state machine, programming against this interface, against the size of these contexts.
13:07 James Lewis
Then it's very difficult to understand which one I should choose. I can't do so automatically. So if you choose a substandard application protocol, we choose one of them effectively, then we get to bridge this gap somewhat. We get to allow our software agents to be able to make decisions deterministically, about how they interact with our software. An example of this is actually RFC 5023. Obviously everyone knows what RFC 5023 is. It's actually at some pub. So in our case, what we chose is Atom Pub. We chose the Atom publishing protocol as the standard protocol, by which we would communicate between different sets of services, different business capabilities.
13:52 James Lewis
What that gave us, is this really standard mechanism for communicating. So in our case, we would post entries into an Atom collection, and that... We would post a user into an Atom collection, and that would... We knew that the application protocol meant that would lead to the creation of this user. Choosing a really simple protocol, narrows the number of decisions that your applications have to make when they're talking to other services. That's a key part of microservices in my mind.
14:30 James Lewis
The next thing is, and this is the question we always get, how small is a microservice? So my mind is small with a single responsibility. So what does that mean? So the original statement of the single responsibility principle, I believe, was a single responsibility principle means that things should have a single reason to change. It's very difficult to answer, "What does responsibility mean?" That things should have a single reason to change. Now, in my mind, we were trying to restate that a few years ago. If we think about this out there, if you're on the outside and you've got these big, all these capabilities, and you drive down in our example with the users we have, actually three little microservices, three applications, with two different databases, if you drive down into those, this is familiar if you're in the object oriented world, those applications are composed of objects which are communicating with one another via message passing.
15:24 James Lewis
And then as you drill down further, you get an object and our restatement, well, my restatement of the single responsibility principle, is that, "These objects should be no bigger than my head." And that's when I put it up against the monitor. The number of lines of code. I'm going to stick it to you in this jacket now. But if I put my head up against the screen, that's how big an object should be. If it gets bigger than that, and that's in normal size font, we're not talking about four point font or anything, normal size font, that's about how big an object should be.
15:54 James Lewis
Bigger than that, it starts to take on a lot more responsibility, starts to do more stuff, starts to have more than one reason to change. And that's okay, because even though I look a bit like Alexi Sayle, I've got a really big head, is actually quite empty, so it's fine. It's actually, I'm making a joke of it, with monitoring a screen and stuff, but actually it's the size. For me it's about, can I fit it conceptually in my head? Can I understand everything about the object? Can I understand the entirety of it, all the things it does in my head, at any one point? That's really what it is. But as you chunk up, can I understand everything about this little microservice, this application in my head? It has a single responsibility and I can understand it conceptually. As you chunk up further, can I understand what the bounded context does, what this slightly bigger unit does? And finally, as you chunk up to the system of systems, does that have a single responsibility? Can I fit that inside my head?
16:57 James Lewis
This has to do with the comprehensibility of the system as a whole, at the different levels within your architecture. So as I said, this is the... More of the single responsibility. These applications turtles all the way down until you hit... Elephants all the way down until you hit the turtles. In Terry Pratchett's world, it's turtles all the way down, and in the, I believe, Hindu belief system, Indie belief system. But, as you go all the way down and all the way back up, these things should be understandable. We should be able to reason about them on their own, comprehensible. The next thing is this idea of product teams organized around product lines.
17:35 James Lewis
I know this is technology conference, sorry, I'm going to drop some of the process stuff in and the team stuff, because back to Alister Cockburn, he went to IBM, this interesting story people might know, but he went to IBM. And his job was to go around and try and understand for a bit, compare different projects and try and understand what methodologies produce the best results. So he went around lots and lots of different IBM projects. Some waterfalls, some crystal, some DSTN, lots of different styles and different methodologies, to work out which had the best results with methodology.
18:07 James Lewis
And his findings were pretty instructive. His findings were that, people are the first order determinant of success when you're building software, it's always about people. Anything else is second or third order. People, it's all about people. So that's why I tend to talk about people a bit. Because when we're building software, we might like to think it's always about the technology, and the design, and the architecture, how we got everything, most of the time it's about people. And product teams organized around product lines. Well, this may be familiar to some people in the room. This is a fairly traditional view of an enterprise, of an organization.
18:41 James Lewis
You have the business over there, actually, that's behind this wide chasm. You've got operations, you've got project management office. You have some testers, you have some developers, in their own individual silos, probably we'll spin up a project, in order to deliver some software. And then they'll go back into their silos. This is a fairly standard. Does this look familiar to people? Nodding? Yeah. Some nodding around the room? I won't get you to your hands up, it's just after lunch. I'll give you some... I think I saw some people slip off to the pub as well. So I'm not sure they're at the back.
19:18 James Lewis
This is a pretty familiar approach to structure and organization. This is what you get taught if you do an MBA, effectively. Now some of the problems with this are... Well, there's a lot of problems with this, in terms of team communication and so on. And what obviously agile teams have been doing for a while is to say, what we do is, we bring cross-functional teams together to deliver software together. So we take some people from developers... Take some developers and testers, some operations, people, dev ops, cool, and hipster. And we put them in, we put them together in one team cross-functionally to deliver software over time, because continuous delivery, which we'll be talking about in a bit, that one step further and says, "Actually, we probably need to bring the business in. We actually need to work in really, small chunks and actually optimize to get in really small chunks of work act through the door."
20:05 James Lewis
And this is interesting. You've got this agile way of working, with these agile cross-functional teams, but then you get this secondary thing laid on top, which is this thing called Conway's law. Now hands up. I will ask you to put your hands up, because you're going to fall asleep otherwise. Who's heard of Conway's law in the room? So actually loads of people have heard of Conway's law. So I'll skip the next section. There is this one thing about Conway's law. This came from Melvin Conway in '68, I believe. He said, "Organizations which design systems are constrained to produce systems that mirror the communication pathways between the systems."
20:41 James Lewis
So he was looking at software in 1968. Microsoft validated this on some of products when they did a study on them 20 years later. And then the Harvard Business Review eventually said, "Yes. We think this is appropriate. We think this is right." This is why we get database teams, and we get UI teams, and we get middleware teams. If you have a structure like that by technical specialty, you end up with databases, middleware, and UI. That's basically what happens, within the teams. The point is the way you structure your organizations, the way you structure your cross-functional teams, hass a direct impact on how you build your software, has a direct impact on how coupled or decoupled the software will be.
21:23 James Lewis
I was reminded of this pretty strongly last week when I was doing some consulting, just up the road in Farringdon, where they had one team which would be based in London for a long time, but in one part of the system. They had another team, right in India, building another partner system. And the part that had been built out in India over a long period of time, was very tight coupled together. It was all web service calls, WS desk staff, or chatty. The bit that was built over here was actually a little bit better. It was doing message processing. I think they actually use NServiceBus. So it was hugely better. It was amazing. I'll take the money later.
22:05 James Lewis
No. It was actually quite a lot better, that part of it. But interestingly, the only per point of decoupling, really between any of the capabilities they're building was across that divide. They had really loose coupling between India and the UK. I've never seen a better example of it. There's another restatement of this, which is, if you ask seven people to write a compiler, you get a seven pass compiler. So for me to talk about product teams, and product lines, and cross-functional teams, it's not enough to just say we'll have a cross-functional team, but actually what we should do is we should organize those teams around the capabilities themselves around, around the chunks of software that we're building, cross-functional teams organized around the chunks of software.
22:45 James Lewis
And whether that's aligned to a product, that's a bit debatable, whether that's aligned to a set of services, or lack a capability. But aligning teams in that way actually reinforces the design of the architectural boundaries around these things, which is a interesting idea. I think. The next thing is this idea of independently deployable and scalable. So I'd really liked this idea that architecture is about degrees of freedom. What we want to do when we're building systems is provide for enough degrees of freedom that we can flex along the lines that we want to over a long period of time.
23:23 James Lewis
We want to be able to scale. We want to be able to maintain. We want to be able to do all the other architecturally cool stuff and useful stuff, over time. We want to make it easy for ourselves to do that, and easy for the teams that we work with to do that. This idea of having lots of degrees of freedom in the system, is another core concept, a core characteristic of microservices. So we've got this bounded context or capability. They are talking to one another, what might that look like? Well, we might want to be able to deploy them on lots of different machines. Say rather than put one big thing on each machine, we might want to be able to deploy more than one instance on one big machine.
24:05 James Lewis
The point is with this kind of approach of splitting things up into smaller units, if you like, then we get the options, we get the degrees of freedom to choose the best approach, as we go along. We might want to host two different types of them on one machine, but we get to choose. This idea of independently, scalable, deployable, changeable, and replaceable applications, the degrees of freedom that we want to have. So here's a question. What's the difference between deploying one of these big monolithic things with a big database, and deploying, maintaining, this micro system approach, well, service oriented architecture.
24:48 James Lewis
Netflix calls this... I tried to do the Netflix example in Germany, and they were like, "We don't have Netflix. What are you talking about?" work's over here, fortunately. But Netflix calls this fine grained SOA. So this is a fine grained approach to SOA. What's the difference? Well, we actually asked, this was last year, we asked inside ThoughtWorks, we said... We sent an email out to all our tech leads, we're not pretending that this is proper research, but it's anecdotal at best, but we sent an email to all the tech leads. Are you building something that looks a bit like a microservice architecture. And we have a number of respondents, and it was about 17 or so. And about 10 said, "Yes we are."
25:27 James Lewis
So this is some time ago, two thirds of our teams are building something that looks a bit like this. Distributed system, based on fine-grained service-oriented architecture microservices. And then there was a bunch of follow-up questions. One of them was, how are you building this stuff, and how are you provisioning it, how are you deploying it? And 80% of those teams were actually using virtualized environments. They were deploying into the Cloud. And I think that this is the core thing, if you're starting to build... If we're starting to build more distributed systems, is that we need to really focus on how we automate provisioning and program our infrastructure. As developers, that's a bit strange, having to think about all this stuff.
26:10 James Lewis
And not everyone wants to, but this is part and parcel of building systems in this way, and if you don't, so you can't be a hipster like me. Independently, deployable, scalable changeable, replaceable applications, built, deployed, and scaled automatically. And that's the key thing, this should be automatic. This is the second part, really, of what I'm going to talk about now, building and deploying and managing microservices. So we've probably all seen this.
26:38 James Lewis
It's a basic build pipeline. You can tell, I've got a bit of a Lego habit. All my disposable income goes on Lego. It's bit of a nightmare. Hey, it works on my machine. I might add some code here. It works on my machine. I guess it's time to commit and push. We get this build pipeline. So it might look like compiling unit test, acceptance test, integration test, performance test, and then finally maybe push out into production. It's a pretty standard build pipeline. If you've got lots of these and they're all doing one individual thing, how do we actually go about forcing this stuff through our build pipelines? How do we actually get these things out into production? Because you could say, maybe we'll push them all out at the same time, we'll put everything out in lockstep, but then you've lost some of the flexibility, for changing things.
27:25 James Lewis
You have to start to worry about things like versioning. So, is there a way of us being able to push individually these services app on their own? But, yes there is, but it's just harder. This is again talking about the giants. Continuous Delivery by Dave Farley and Jez Humble. There's a lot of data. There's a lot of information and there's a lot of patterns that help us decouple these applications, sorry, these pipelines, which allows us to push these things out independently. There's a few examples. You might have multiple applications, which join at a particular point. You might have two services which come together to do integration testing. Test fail for application A, application B can still proceed because you can pull in the last good version, things like that.
28:11 James Lewis
You can do things like federated pipelines. So you actually have two different capabilities, which are independent of one another, but maybe have an API dependency. So you can build them like separately, but they've got a runtime dependency. Now I depend on the API being stable in a particular version. So a great way of building these things are to use techniques like consumer driven contracts, consumer contract testing. So if you've not come across this, this is an interesting, I think a pretty cool idea, because what you have is the consumer of a service, which is built in parallel to the actual service itself, consumer rights and tests that describes how they use the service.
28:53 James Lewis
They say, "I hit this end points, and I expect this behavior when I hit this end point." That's what we do when we're integration testing against other systems. But the key thing here is, the consumers give the tests to the service, and the service runs on as part of that build. So if I'm the service, I might be running the tests for several different consumers, which will give me really fast feedback if I make a breaking change to the API set, it's interesting like that. I talked a bit more about this. So the consumer team pushes test scripts out towards the services and the services run all the test scripts, which means that my contract is effective to the set of all the behaviors my clients expect of me.
29:35 James Lewis
Now, these are sorts of things that we've had to do in the past, when we're integrating between systems. The thing is with microservices, when you've built fine-grained service-oriented architecture, a lot of the additional complexity comes from integration, because most projects, the scary bits, are the integration bits. This are the bits we fear, this are the bits I fear anyway. With this, we get lots more of them. Fortunately, cause they're mostly under our control, we can apply patterns to solve them.
30:03 James Lewis
This is from a real project. So we had a number of services all being built. In parallel, we had some semantic versioning going on. So they depended on specific versions of each other, which they could be built against. We had integration environments, what's called a product build pipeline, which we built. These will be available afterwards for you to have a look at. So you don't have to take all of this in all at once. But then we had separate applications which had runtime dependencies, API level dependencies, and they provided consumer-driven tests, consumer-driven contracts into our build pipelines.
30:40 James Lewis
As I say, continuous delivery is a bit more in there. It's a massively thick book, but probably one of the most important books been written about software engineering in the last 20 years. So the next thing is how far we can build them. We can get these binaries. We can get these things that we can push out. How do we actually deploy them? So, we use infrastructure automation. This is the thing that's actually been taken off massively in the last two years. If you knew how hard it is to get a picture of both a chef and a puppet, that's what that is. Hey. So, using tools like chef and puppet. So obviously, I'm getting the vast majority of developers here spend most of their time on the .NET platform. Two years ago, the story around this, story about Infrastructure automation .NET was abysmal frankly, really hard to do a lot of this stuff, really hard to program your infrastructure.
31:34 James Lewis
These days, it's actually a lot better, much better. So a lot of the tooling I'm going to be talking about, the practices are actually, you can do this on .NET platform, even though they evolved mainly in the current JVM ecosystem, and the Unit's ecosystem. You can do most of this stuff now. And in fact, we're working with clients, a lot of clients who are actually doing this. So using tools like chef, which will later provision your machines, install software they need, and then install your own application software on to them. Automatically, it's crucial to manage this complexity when we get lots of these things.
32:08 James Lewis
The next thing is this idea of Phoenix infrastructure. A lot of people who've come across the idea of Phoenix servers, rather than a snowflake servers. So snowflakes are the... You've all seen it. You've got a Windows 2003 server box. It's got IOS on it. It's been there for 15 years. It's got all crowd built up there. It's doing log-in a particular way, which is completely different from the server next to it. It's got... Someone once installed this particular set of development tools, because they thought it was useful to debug it. And over time our server farms diverge. So actually if they go away, if they fail, it's actually very difficult to recreate them. And it's also... There's an increased cognitive load and our operations people trying to manage these things. What Phoenix infrastructure patents says, we don't do that.
32:53 James Lewis
We use infrastructure automation and we should be able to recreate from scratch our infrastructure all the time. So we should be able to wipe our box and then run a script, which will rebuild it from scratch and install all the dependencies we need, and also install our infrastructure on it. And when you moved to the Cloud, this becomes even more interesting because you get to do things, like auto scaling. You can scale automatically. You can provision computer on demand, all these kinds of things. And if you've got Phoenix infrastructure, you can just turn boxes on and turn them off and they just come off completely provisioned automatically. Then you can turn them off and they go away, and you don't have to worry so much about this drift and the cognitive overload.
33:37 James Lewis
Plus you get to use all the cool patterns around auto scaling and so on. There's auto scaling and status where... So we'll be talking a little bit about this next. This is just notes. This is a story from a client in Australia. They have this thing, what they call the zombie apocalypse. They're very heavily into automation, very heavily into using AWS. All of their development platforms, all of their QA platforms and testing platforms, all on AWS. And they came in one morning and they'd lost. They got about 700 instances typically running. It's quite a lot of compute. Came in one morning, they'd lost about 5% of all of their machines. All of their development and testing infrastructure just disappeared overnight gone. What's going on there? So they rebuilt them. That's fine. And then they came in the next night, they'd lost about 97% of everything.
34:24 James Lewis
So, they'd lost like nearly 700 instances of... Just turn themselves off, terminated overnight. It's a bit sucky. What the hell is going on? What they actually realized is they've these instances called zombies. If you've worked with a Cloud, what you tend to have to do is you start an instance, then you tag it to tell it what it is. "I'm a development web server." Sometimes the API call fails and you end up with these boxes just running randomly, but no one knows what they're doing. No one knows what they are. So, they had a thing in the background called the zombie killer. And that process would just go through and identify things that weren't tagged and turn them off. Pretty sensible thing to do.
35:03 James Lewis
You don't want to be wasting too much compute. Unfortunately, there was a bug in the API. So when they actually started asking for the list of untagged instances, it gave them back every instance. So it said, "All of your instances are zombies." And the first night when it was 5%, because of course itself is running in the Cloud, it worked its way down the list until it got to itself, which was quite top, near the top of the list and then turned itself off and everything else survived. And then the second night it was at the bottom of the list. It turned everything off, and sucky. And they call that the zombie apocalypse, for obvious reasons.
35:38 James Lewis
The reason I tell that story is well, A, because I think it's pretty funny. But there's a warning in there, but also when you actually talk to them, they said, "Do you know what, it actually forced us to do some really interesting work, because we realized that even though we were in the Cloud, we weren't able to recreate a lot." They lost their build servers, they lost their GIT repos. They lost everything. Everything just disappeared and they couldn't recreate them automatically. So they actually didn't invest in doing that. But, it's a warning if you opt to go into the Cloud, stuff can happen.
36:08 James Lewis
Anyway. So, we saw earlier this basic build pipeline. While works my machine, gets its time to come in and push, compile unit tests and so on, about 10 minutes left. So, how do you deploy into environments? So, you're going to... Probably when you push code, it's going to run locally on your machine, deployed onto bill machine, deployed into the integration environment, UAT environment, performance environments, and their filed into production. But when you start pushing out into bigger environments, if you've got 20 services that you're pushing, maybe independent, they're all talking to one another potentially on 20 different boxes, it might only be when you push out to environment with 20 boxes on, that you end up finding problems. So what we're saying, Mr. Nygaard who wrote 'Release it!' He says, "Test in a realistic environment as early as possible. It's even more important when you've got things talking to our other." So what we can do is use visualization.
37:07 James Lewis
And this place where I was at last week, .Net shop, they do this. They've got Vagrant images, where they've got tens of services on them, that they run on their development machine. So if they need to test, when they test locally, they stand up a VM and they can test locally on their development machines. So they pull the more complex environments back up the build pipeline to add them to test as realistically as possible. And you can do things like use Vagrant in the Unix world. You can use lightweight containers as well, but Vagrant is certainly usable.
37:38 James Lewis
And Vagrant is pretty simple. You just say box add Vagrant in it, Vagrant up, you get your VMs running with all your services and stuff on it. Even I can do that. I should point out I still write code as well as being an architect. But it just uses your provisioning tool under the hood, to provision your VMs. You can do multiple virtual machines. So that's actually two VMs talking to one another on the same box. So you can actually do this to declaratively. It's really simple to do, really straightforward. And it does all the NATing and et cetera. So we can deploy locally. We've got multiple VMs. It's all very exciting.
38:17 James Lewis
It really does work in my machine. It's going to work like it does in production probably, and that's pretty cool. But we can also specify dependencies declaratively, and there's tooling around doing this though. So this is actually puppet, but you can do this in chef, this is the infrastructure automation I was talking about. We can apply this environment definition, sorry, this infrastructure definition to each of our environments as we go. So effectively, you push your configuration along the build pipeline, along with the code that you're writing. So that's pretty cool. So you've now got traceability. ITIL for the win. This is more ITIL compliant than most people probably think.
39:01 James Lewis
And we abstract away, actually, the fact that maybe we're deploying locally, the data center, or whatever it is. So you may be using, say an infrastructure as a service provider, as this company in Australia is, just to do testing and integration, but you might then deploy out into a data center, might deploy 10. It might be something that you wanted to do. As I said, I mentioned traceability. One of the things is, I would like complete traceability of all the artifacts from a check-in, from where I check in my code. Whether it's configuration with infrastructure code, or whether it's my application code all the way through, but I also want to use the same tool. Actually, I want some level of abstraction over the tooling so that I can use the same tool to deploy locally, as externally as well.
39:52 James Lewis
So I want to use the same tool everywhere. And you can actually do that now as well. So you can build specific, it's almost like build DSL, is build the main specific languages, over the top of your application infrastructure to allow you to deploy... To hide the fact that you might be deploying locally. Deploying locally starts a Vagrant VMs. It deploys the stuff onto those, so I can test that. A different command with a different argument might deploy into UAT or into QA or into integration or wherever it is. But you're using the same thing. You've got traceability all the way through. This is something called fabric. This is a Python tool, which is interesting because you can... This is saying, "I want to deploy the type of environment and the build number." Lets say it's build 486.
40:38 James Lewis
The environment is UAT deployed fully, or go off and do it. And you end up with things like, skip over, this is basically a wrapper over SSH, but you end up with environments, specific definitions. This is just some piping code. The differences here is you say, this is going to run remotely and that's going to run locally. So yeah, it's really similar. I can execute stuff either on my local environment or my remote environment. I don't know. It's terrible code. I'm sorry. I told you I wasn't particularly good. That's pretty cool. So you can actually use the same tooling to deploy everywhere. So you've got this traceability of configuration, traceability of how you install stuff, traceability of actually how you run things all the way through, from my machine into how the build server deploys that into production.
41:29 James Lewis
This is what this outcome might look like. That might describe the production environment. Say declaratively, this is what prod looks like, the tool makes it happen for you. There's lots of tooling around this. So examples from Amazon and Rackspace or OpsWorks and checkmate. Those are just tools you write a manifest and say, "Start me up, this 200 node cluster." So finally deployment, the Fowler bomb pushes artificial complexity into your infrastructure. Since that you need to do a lot of monitoring and logging, if you've got distributed systems. So you need to really understand that monitoring is a first-class concern. Developers, we don't... As a developer, I've never really thought too much of that. Logging, I've thought about logging mainly because everyone tells me that log is important.
42:17 James Lewis
So I just use it in an appender, store on a desk somewhere. But monitoring that's what production do. That's what operations do. But actually it's becomes really important if you're building distributed systems that you actually allow them to be self-monitoring. So you do things like, you can give them status pages. So each of them report what they're doing. So, you can report metrics at a well-known location. And you can use them locally to see what's going on when your performance testing, but also operations can use them to work out what the system as a whole is doing, status where applications are the way forward.
42:51 James Lewis
Now this is an open source tool called Graphite. Another tool that's become... God, I've totally forgot the name of it. It was very big in the Ruby world. New Relic that's become very big in the .NET world recently, hasn't it? These kinds of tooling, layering this tooling on top of your systems is totally crucial if you've got distributed system. You need to be able to understand when things are failing, when things are slowing down, how to fix them quickly. It's a movement from meantime, towards meantime to recovery. And if you're going to focus on meantime to recovery, you have to make sure that you know when things are failed, so you can recover fast. So layering on things like Graphite, making sure we're using New Relic, and also as developers that we never had to use them. We know what they mean. We know what other things they're doing. Downstream systems, you can do things like health checks if downstream systems fail. I want to know about it again, as soon as possible, meantime to recovery.
43:45 James Lewis
If I've got a status page that tells me, then we're onto a winner. This is from Yammer, actually. It's a library that allows you to do metrics and health checks. And there is a .NET port. I was using it in C# and two projects ago. It was originally written in Java, which is why this is in Java, but there's a tool called metrics. And there's a bunch of open source tooling that you can use and are pretty much, most of them agnostic, around what infrastructure you would applying on to.
44:17 James Lewis
Splunk is very expensive. And Zipkin, Zipkin is awesome. So Zipkin is the Twitter monitoring solution for distributed systems. So Twitter, gives you a network view. You know like in Chrome or Firefox development tools, you get the... You hit URL, and you can see the length of time it takes to make requests for assets and so on. Well, this gives you a similar view, but traces the requests throughout the distributed systems. You can actually see... So a user hits through the front end, that request goes off to the servers over here, which might go through the servers over here, goes off the servers over here, servers over here, and actually shows you in, almost like a Gantt chart, how long the whole thing has taken. It's pretty cool.
45:03 James Lewis
Oh, shouldn't have done that. Is that going to come back, or we need to... It's not going to come back. So I did mention, and this is the last bit. So I'm 38 seconds over, apologies. I did mention we're standing on the shoulders of giants with this stuff. This is particularly around microservices. The reason I focused on infrastructure and automation and building deployments, it's because it's crucial. But shoulders of giants, obviously we've got Udi standing at the back. He's one of the giants. But then we've actually got these guys, Ken Thompson, Dennis Richard, because a lot of the things we've been talking about in the microservice community, goes back to when Unix was invented.
45:47 James Lewis
Well, this is a PDP-11, back in I think late 60s. A lot of the principles about modularity, decoupling, cohesion, simplicity, uniform interfaces, all these things actually goes back that far. And this is just to prove that. This is the rules of Unix programming from the art of Unix programming. Modularity, clarity, composition, separation, simplicity, parsimony, transparency, robustness, representation. I particularly like rule 16, diversity, which is distrust all claims for one true way. So just because I'm building systems with microservices, it might not be the right approach for you to take, make your own minds up. But this goes all the way back to them.
46:35 James Lewis
A bunch of books. So Art of Unix Programming, Domain-driven Design The Blue Book, REST in Practice, I'm a bit of a REST fan, Continuous delivery, Enterprise integration Patterns and Release It!. And finally, I talked about degrees of architectural freedom. It's not as simple as saying, I want all of these degrees of freedom, when we're building systems. We're always making trade-offs. So these are some of the trade-offs we might make. We might trade off throughput versus costs. How many messages we can process per second versus how big our server thought is, these sort of things. Portability versus deployability. It's easier to deploy onto a single platform that is deployed to many, but by building systems of systems, or rather microservices, we do get more options. We get more flex, but you have to layer in tooling. You have to layer in monitoring, infrastructure automation on top of it.
47:30 James Lewis
For me these are pretty exciting times. The last slide. It's not exciting times which the last slide, it's an exciting time in the terms of microservices, in terms of people starting to use this. It's a community driven thing. This is not the open group saying, "Thou shalt use microservices." This is a community driven thing, where people are just working how the best way of building these systems are. There're big companies. There's the Netflixes, the Amazons, there's the Twitter's, the same Clouds. There's the smaller companies, and people are just working how to build this as we go. But as I said, it's a developer led thing. It's a community driven thing. It's not about someone standing at the front and saying shouts, and downloading the OmniGraffle templates, or the VGA templates. It's you and I to put this stuff together. So if you want to get involved, there's another conference coming up. I think there's a user group that goes on as well. They invited me for some reason. So yeah, that's it. So that's the end of the talk. Thank you very much.