Oct 05 2015

The Next 10 Years: Megalith

I've decided what I'm going to be working on for the next 10 years. It's epic and exciting, and I'm going to need your help. It's called Megalith.

Megalith is a symbol of the ideal I've been working towards my entire career. It's constantly evolving and very nuanced. I'm going to be upfront and say that I'm not going to be able to fully explain what Megalith is in this post. Instead, I'm going to start setting up context. To me, context is everything.

For the past few years I've been spending most of my working hours writing open source software related to a project I worked on in 2012 called Docker. Docker was created as a skunkworks collaboration between me and some talented engineers at dotCloud, now Docker. The company pivoted 100% to Docker and is now worth about a billion dollars. As an independent, I didn't stay with Docker, I moved on to the next problem. Docker was one piece of a grander vision.

To help pay for this lifestyle, I've experimented with sponsorships and even fell into lucky situations. Last year I tried to make it a little more sustainable by starting Glider Labs with a friend. We focused on consulting around Docker. We helped some of the first people to actually run Docker in production. We did this to learn and get a better grasp on the Real Problems. Something many vendors in this space don't really do. We learned a lot and as a result we ended up making a lot more open source software.

The problem is that there is a lot more to make. This ideal I have in mind that's been developing in my head for over 5 years is a massive undertaking. I've realized if I'm going to keep pursuing it, I need two things: a better vehicle for the work, and a unifying project to get help around it.

Building a better organization for this work

The software I write that people love comes from a compulsive drive that goes beyond and even against the idea of startups. With the exception of Docker and a few other collaborations, I've never made anything that people loved while working for a startup. Naming and evangelizing webhooks was not something anybody paid me to do. In fact, a lot of projects I've built or think should exist are too small to sustain a startup. Does that mean they shouldn't be built? Or that I should temporarily dedicate my life to maybe make one of them work as a startup?

Even a lifestyle business is quite a commitment to make work. My friend Alan Shreve made Ngrok, inspired by my tool Localtunnel. It's free and open source, but he's also bootstrapped a business out of it. This business is what he spends most his working hours on.

Given my goals and values I do prefer this approach, but it still poses a problem. The time spent writing lines of code to support a business, the time spent figuring out market fit, the time spent on support and operations … this is time not moving forward to me. It's extracting wealth out of something that already exists.

Why do this? So Alan can sustain himself and potentially fund other projects, right? In the meantime, I know for a fact that there's a lot of great open source software that he's not making.

His goal is passive income. For a lot of us independents, that's the dream. It may or may not realize in full, but it's certainly time consuming either way. In that way, it's sort of just a smaller variation of the startup lottery.

Meanwhile, in the same time, I've put out dozens of open source projects that solve problems or work towards dissolving larger problems in the long-term. I actually can't help it. It's compulsive like I said. The only way I see it stopping is if I leave the space altogether. I don't get paid to do 90% of these projects. They help bring me contract work, but seemingly only to take time away from supporting and building a community around those projects.

The other problem, for me, is that running a business causes you to make software differently. You think about building software you can sell, or that supports what you can sell. More than doing one thing well, you think about the features people will pay for. More than making it simple, you think about obscure enterprise and legacy use cases. Conventional knowledge says you must do those in some way at some point because that's How It Works.

The problem is worse for startups that take VC money. Even VCs that "get it" and let you focus on open source traction still expect you to eventually figure out how to monetize and make them millions. To varying degrees this often makes startups:

  • focus on enterprise customers, not regular developers
  • ship software that solves short-term problems, or yesterday's problems
  • prioritize sexy demoware without production hardening
  • increase perceived value with more hires and more partnerships
  • de-prioritize any effort outside the product that makes money

Hashicorp is one of the best examples of companies in this space that have done a good job at taking just enough VC and working against a lot of these forces. However, they still work within the framework. They still have a commitment to exit big someday. The implications of this are not insignificant.

I'm much more likely to bootstrap a company like Alan than take VC money. Not only is it just more my style, but a VC startup just won't play to my strengths. Though, building a bootstrap business doesn't seem to produce the most value for my time either. Or make me very happy.

I'd much rather find a new way. Not just because I want to play to my strengths, but because I know I'm not the only one this applies to. I also know that a different, better kind of open source software will result if done properly.

What I want is something of an independent R&D lab. I want us to re-capture the innovation and invention of Xerox PARC and Bell Labs, but focusing on open source. I want us to have the freedom to explore and build software right with like-minded people. Not to get rich, but to slow cook software. Systems software that further empowers individuals and small groups … enterprise customers of the future, not the past.

This is what I want Glider Labs to transition into. In fact, it's already been operating like this in a way. And I've been exploring and learning ways to make this work for years now. It's part business, part cooperative, part public service. But to make the leap to a lab that supports more than myself, it won't happen over night. And I can't do it by myself.

Sharing the vision, enabling participation

This isn't just about a new organization. It has to have some purpose, some initial unifying project. In order to start from nothing, there needs to be a clear mission of value. Not just boundless experimentation. Luckily, most of my work does fall under a certain theme driven by a nebulous but nonetheless motivating ideal. That seems like a good place to start.

I've been told if I just wrote down everything I want to build and why, people might be willing to help out. This is challenging both because of scope and its constant evolution. I figured if I just keep making projects people will start to see it, but other than a few people I'm not sure that's working out. So I'm going to try a more top-down approach.

The real project this post is about is a meta-project I'm calling Megalith. It's an umbrella project to help unify and bring a common goal to all the work I've been doing for the past 10 years, and over the next 10 years.

I know I can't do it alone, so the project is designed for participation. It will involve many more specific projects that are open source and independently useful. Many already exist. Most do not.

Whether or not the final ideal is achieved, it will be approached. Lots of value will be produced in the process. Not just software and contributions to existing open source, but guides and how-to knowledge of everything I've learned to lead me to my current conclusions, and everything we learn in the process.

Glider Labs and Megalith are separate but related parts of this venture. Megalith is the meta-project, Glider Labs is the organization. The idea is that they support each other. Megalith makes this new Glider Labs a reality, Glider Labs makes Megalith a reality.

Relevant to your interests?

The first step is to explain Megalith and try to communicate this idea in my head, or at least some manifestation of it. Then everything else will start to make sense. It's almost more about approach and values. It's about an idea of simple, composable, extensible tools to make modern end-to-end development and operations sane at both large and small scale. And making the world more programmable…

Anyway, it's more than I can get into here. I've set up an announcement mailing list you can subscribe to. Sign up and you'll get emails about what's next. I might even email you directly to say hi.

Feel free to get in touch with me, leave a comment below, or help out by sharing this post if it resonates with you. I'm pretty excited, especially since a lot of people have expressed interest so far.

Lastly, here's a silly video I made about it:

Subscribe for updates!


Oct 28 2014

Deis Breathes New Life into Dokku

Today I'm excited to announce that Dokku is now sponsored by my friends of the Deis project. This means that OpDemand, the company behind Deis, will be funding part-time development of Dokku and its components.

Remember Dokku?

A little over a year ago, I announced Dokku as an open source "Docker powered mini-Heroku." It quickly became the first killer application for Docker. Designed to be simple and hackable, Dokku enables web developers to run their own single-host PaaS that's directly compatible with Heroku.

As the project took off, I went on to tackle the challenges of a multi-host PaaS with the Flynn team. Even without me, the Dokku community continued to grow, thanks to the help of new maintainers and contributors. The experimental plugin system allowed all sorts of customizations and extensions of Dokku to flourish.

Over time, though, the wonderful volunteer maintainers of the project started to get burnt out. Handling issues across a dozen language runtimes and even more plugins is taxing. Many were upstream buildpack or Docker issues, or larger inherent problems of the project requiring stronger leadership to resolve.

Although Dokku is still used and loved today, without active maintainership and leadership, it was at risk of "bit rot". I came to the conclusion that it was in need of some love from the original author. Luckily, the Deis team was willing to help make this happen and is effectively saving the project from a slow death.

About Deis

Not long after I started collaborating with the Flynn team, another project called Deis came onto the scene. Both projects have the goal of being enterprise grade, multi-host PaaS solutions. Although technically competitive, as open source projects composed of great people, we openly share information and components. As an independent agent, I try to bridge silos and facilitate that kind of sharing and communication. I'd gone out to visit both teams to collaborate, talk shop, and have fun.

I eventually moved on from Flynn and started independently exploring distributed systems components in a Docker world. Deis continued to adopt and support many of my open source components. They always kept an open dialog with me and others in the Docker community. When I mentioned my plans to reinvigorate Dokku, they were quick to offer help.

The Sponsorship

The timing for this sponsorship is perfect. Deis now requires at least 3 hosts in a cluster, making Dokku the obvious recommendation for smaller deployments. The projects will focus on shared components even more. This sponsorship will also ensure a smooth migration to Deis if a Dokku user wants to go down that path.

What is Dokku expected to get? First, time and thought put into getting the project modernized and on path for a solid 1.0 release. Among other things, this involves redesigning aspects of the project to make it more sustainable as an open source project.

Much of the lessons of Flynn and Deis, as well as reflections on Dokku itself, will feed back into Dokku. My plan is to:

  • make it more robust and testable
  • improve code quality and standards
  • properly direct upstream issues upstream
  • improve documentation and basic support processes
  • add popular features, such as addons and Dockerfile build support

And if you can believe it, I plan to make it more modular and even simpler.

Yay, Dokku!

Along with Deis, I want to thank all the contributors and maintainers involved in Dokku. I especially want to thank asm89, rhy-jot, plieter, fcoury, and josegonzalez. The project would already be dead without them. If you want to get involved, I'll generally be in the #dokku channel on Freenode sharing updates as I progress. Most of my work will be in a new branch, but first it will take place in creating and updating components used by Dokku.

I'm only able to put a day or so of hours a week into the project, but steady, consistent effort and help from the community will ensure Dokku will be around for a long time!

Sep 10 2014

Automatic Docker Service Announcement with Registrator

No matter which service discovery system you use, it will not likely know how to register your services for you. Service discovery requires your services to somehow announce themselves to the service directory. This is not as trivial as it sounds. There are many approaches to do this, each with their own pros and cons.

In an ideal world, you wouldn't have to do anything special. With Docker, we can actually arrange this with a component I've made called Registrator.

Before I get to Registrator, let's understand what it means to register a service and see what kind of approaches are out there for registering or announcing services. It might also be a good idea to see my last posts on Consul and on service discovery in general.

Service Registration Data Model

Service registration involves a few different pieces of information that describes a service. At the very least, it will involve a service name, such as "web", and a locating IP and port. Often, there is a unique ID for a service instance ("web.2"). Some systems generate this automatically.

Around this, there might be extra information or metadata associated with a service. In some systems this could be key-value attributes. Or maybe just tags. Classic service discovery of the zero-configuration world would also include the protocol (HTTP, SMTP, Jabber, etc), but this isn't very useful information since in our case we already know the protocol of the service we're looking for.

When using etcd or Zookeeper it's up to you how your service directory works, both what information is stored and how to structure it. Specialized service discovery systems like Flynn's discoverd or Netflix's Eureka provide more structure around service semantics. Consul is sort of a hybrid, since it's really a specialized service discovery system built-in to a general configuration store.

Consul lets you define a service name, IP, port, optional service ID, and optional tags. In a future release, I believe it will tie in more with the key-value store to allow you to have arbitrary attributes associated with a service. Right now, Consul also lets you define a health check to use with its monitoring system, which is unique to Consul.

So far, that gives you an idea of the data involved in registering a single service, but that's not the complete model. A service "record" is a reference to an actual service, and it's important to understand what that actually is. Whether using containers or not, a service will always boil down to a long-running process, and a process may listen on several ports. This could imply multiple services.

One could argue that if a process listens on multiple ports for the same functional service, it might be a good idea to collapse it into a single service. Modeling it in this way ends up being either complicated (putting the other service ports in meta-data), or incomplete ("which port do I use for TLS?"). I've found it's simplest to just model each port a process listens on as a separate service, using the name to logically group them. For example, "webapp-http" and "webapp-https".

Registering In-process or Using a Coprocess

The most common strategy to register in service discovery is actually directly self-registering from the service process itself. From a "good solution" perspective, this might seem terrible. But it's common for a reason. Mostly, it's pragmatic, as many organizations build their specific services around their specific service discovery system. However, it does have other advantages.

Service discovery systems like Eureka and discoverd provide a library that can be used in your service to register itself, as well as lookup and discover other services from in-process. This provides opportunities like having balancing and connection pooling logic taken care of for you, without the extra hop of a reverse proxy. And in cases where heartbeats are used for liveness, the library can handle heartbeating for you.

The disadvantage of this approach as a reusable system is that libraries are hard provide across languages, so there might be limited language support for the library. Depending on how complex the library is, it may also be difficult to port for people that want to make the effort to expand language support.

Though, the biggest disadvantage is putting the responsibility on the service in the first place. This creates two problems. First, if you intend to make your services useful to anybody else, your service will be less portable across environments that use different discovery mechanisms. Netflix open source projects suffer from this, as people already complain it's too hard to use some of their components without using all of them. Second, third-party components and services like Nginx, Memcached, or pretty much any datastore will not register themselves.

While some software might provide hooks or extensions to integrate with your service discovery, this is pretty rare. And patching is not a scalable solution. Instead, the common solution for third-party services is to put the registering responsibility near the service.

If you're not directly registering in-process, the second most common approach is running another parallel process to register the service. This works best with a process manager like systemd that can ensure if the service starts, so does the paired registering service.

Some call this technique using a coprocess or a "sidekick". When working with containers, I usually use coprocess in reference to another process in the same container. A sidekick would be a separate container and process. Either way, this is a useful pattern even beyond service registration. I use it for other administrative services that support the main service, for example to re-configure the service. The open source PaaS Deis used this pattern for shipping out a service's logs. However, it seems to simplify they're moving to my tool logspout.

A variation of using a coprocess is process "wrapping", where you use a launcher process that will register and run the service as a child process. Flynn does this with sdutil. Some might say it can make starting services feel very complicated since you now have to configure the service as usual, on top of providing registration details to the launcher. At the end of the day, this is effectively the coprocess model launched with one command instead of two.

The Problem with a Coprocess for Registering

In whatever form it comes, a coprocess comes with two challenges: configuration and manageability.

With a parallel announcing process, you need to tell it what service or services it should announce, providing it all the information we talked about before. An interesting problem with any external registration solution is where that service description is stored. For example, if you were doing announcement in-process, it would at least already know what ports it exposes. However, it most likely wouldn't know what the operator wants to call it. Some systems will roll all this information up into higher-level system constructs, like "service groups" or some unit of orchestration. I prefer not to couple service discovery with orchestration. Instead, I'd rather service semantics live as close to the service process as possible.

A coprocess or sidekick for registering also means you'll have one for every service you start. There is no technical problem with this, but it introduces operational complexity. A system has to manage this, whether it's a process manager like systemd or full-on orchestration. That system likely has to be configured, adding more configuration, which may or may not be the right place to define the service. And now you need to be sure to always use this system to launch any service, since running a service by hand will not register the service.

In an ideal world, we don't worry about any of this. We just run a service and its ports somehow get registered as services. If we want to specify more details about the service, we can do this in a way that's packaged as close to the service as possible. And of course, we want an operator and automation friendly way to set or override that service definition at runtime.

How Docker Helps Achieve the Ideal

Running services in Docker provides a number of benefits, and those who believe Docker is just about container isolation clearly miss the point. Docker defines a standard unit of software that can have anything in it and yet have a standard interface of operations. This interface works with a runtime that gives you certain capabilities in managing and operating that unit of software. These capabilities and this common container model happen to have everything we need to automatically register services for any software.

The Docker container image includes default environment variables, which can be defined by the Dockerfile. This turns out to be the perfect place to describe the service it contains. The container author has the option to use the environment variables to include their idea of how the service should be described and registered, which will be shipped with the container wherever it goes. The operator can then set runtime environment variables to further define or redefine their own description of the service.

The Docker runtime makes these values easy to inspect programmatically. The runtime also produces events when a container starts or stops, which is generally when you want to register or deregister the services of the container.

All this together lets us provide automatic service registration for any Docker container using a little appliance I've made called Registrator.

Introducing Registrator

Registrator is a single, host-level service you run as a Docker container. It watches for new containers, inspects them for service information, and registers them with a service registry. It also deregisters them when the container dies. It has a pluggable registry system, meaning it can work with a number of service discovery systems. Currently it supports Consul and etcd.

There are a few neat properties of Registrator:

First, it's automatic. You don't have to do anything special other than have Registrator running and attached to a service registry. Any public port published is registered as a service.

Related but fairly significant, it requires no cooperation from inside the container to register services. If no service description is included and the operator doesn't specify any at runtime, it uses Docker container introspection for good defaults.

Next, it uses environment variables as generic metadata to define the services. Some people have asked how you can add metadata to Docker containers, but the answer is right in front of them. As mentioned this comes with the benefit of being able to define them during container authorship, as well as at runtime.

Lastly, the metadata Registrator uses could become a common interface for automatic service registration beyond Registrator and even beyond Docker. Environment variables are a portable metadata system and Registrator defines a very data-driven way to define services. That same data could be used by any other system.

In terms of previous work, Michael Crosby's project Skydock was a big inspiration on the direction of Registrator, so it might be worth looking into for reference. Registrator is a little more generic and made specifically for distributed systems, not as much for single host registries. For example, Registrator focuses on published ports and uses a host-level IP as opposed to local container IPs. For people interested in single-host discovery, Registrator has already inspired compatible alternatives, including Brian Lalor's docker-hosts.

In any case, I believe I've made the first general purpose solution to automatic service registration. Here's a video demo:


In retrospect, the problem we've solved here now seems very trivial, but we've never had this before. Like many good designs, it can take a while for all the pieces to come together and make sense in one's mind before it becomes obvious. Once it's obvious, it seems like it always was.

Combining auto-registration with a good service directory, you're almost to an ideal service discovery system. That last problem is about the other side of discovery: connecting to registered services. The next post will describe how this is also not as trivial as it sounds, and as usual, I will offer an open source solution.

Aug 20 2014

Consul Service Discovery with Docker

Consul is a powerful tool for building distributed systems. There are a handful of alternatives in this space, but Consul is the only one that really tries to provide a comprehensive solution for service discovery. As my last post points out, service discovery is a little more than what Consul can provide us, but it is probably the biggest piece of the puzzle.

Understanding Consul and the "Config Store"

The heart of Consul is a particular class of distributed datastore with properties that make it ideal for cluster configuration and coordination. Some call them lock servers, but I call them "config stores" since it more accurately reflects their key-value abstraction and common use for shared configuration.

The father of config stores is Google's Chubby, which was never made publicly available but is described in the influential Chubby paper. In the open source world we have Apache Zookeeper, the mostly defunct doozerd, and in the last year, etcd and Consul.

These specialized datastores are defined by their use of a consensus algorithm requiring a quorum for writes and generally exposing a simple key-value store. This key-value store is highly available, fault-tolerant, and maintains strong consistency guarantees. This can be contrasted with a number of alternative clustering approaches like master-slave or two-phase commit, all with their own benefits, drawbacks, and nuances.

You can learn more about the challenges of designing stateful distributed systems with the online book, Distributed systems for fun and profit. This image from the book summarizes where the quorum approach stands compared to others:

Quorum datastores such as our config stores seem to have many ideal properties except for performance. As a result, they're generally used as low-throughput coordinators for the rest of the system. You don't use them as your application database, but you might use them to coordinate replacing a failed database master.

Another common property of config stores is they all have mechanisms to watch for key-value changes in real-time. This feature is central in enabling use-cases such as electing masters, resource locking, and service presence.

Along comes Consul

Since Zookeeper came out, the subsequent config stores have been trying to simplify. Both in terms of user interface, ease of operation, and implementation of the consensus algorithms. However, they're all based on this very expressive, but lowest common denominator abstraction of a key-value store.

Consul is the first to build on top of this abstraction by also providing specific APIs around the semantics of common config store functions, namely service discovery and locking. It also does it in a way that's very thoughtful about those particular domains.

For example, a directory of services without service health is actually not a very useful one. This is why Consul also provides monitoring capabilities. Consul monitoring is comparable, and even compatible, with Nagios health checks. What's more, Consul's agent model makes it more scalable than centralized monitoring systems like Nagios.

A good way to think of Consul is broken into 3 layers. The middle layer is the actual config store, which is not that different from etcd or Zookeeper. The layers above and below are pretty unique to Consul.

Before Consul, HashiCorp developed a host node coordinator called Serf. It uses an efficient gossip protocol to connect a set of hosts into a cluster. The cluster is aware of its members and shares an event bus. This is primarily used to know when hosts come and go from the cluster, such as during a host failure. But in Serf the event bus was also exposed for custom events to trigger user actions on the hosts.

Consul leverages Serf as a foundational layer to help maintain its cluster. For the most part, it's more of an implementation detail. However, I believe in an upcoming version of Consul, the Serf event bus will also be exposed in the Consul API.

The key-value store in Consul is very similar to etcd. It shares the same semantics and basic HTTP API, but differs in subtle ways. For example, the API for reading values lets you optionally pick a consistency mode. This is great not just because it gives users a choice, but it documents the realities of different consistency levels. This transparency educates the user about the nuances of Consul's replication model.

On top of the key-value store are some other great features and APIs, including locks and leader election, which are pretty standard for what people originally called lock servers. Consul is also datacenter aware, so if you're running multiple clusters, it will let you federate clusters. Nothing complicated, but it's great to have built-in since spanning multiple datacenters is very common today.

However, the killer feature of Consul is its service catalog. Instead of using the key-value store to arbitrarily model your service directory as you would with etcd or Zookeeper, Consul exposes a specific API for managing services. Explicitly modeling services allows it to provide more value in two main ways: monitoring and DNS.

Built-in Monitoring System

Monitoring is normally discussed independent of service discovery, but it turns out to be highly related. Over the years, we've gotten better at understanding the importance of monitoring service health in relation to service discovery.

With Zookeeper, a common pattern for service presence, or liveness, was to have the service register an "ephemeral node" value announcing its address. As an ephemeral node, the value would exist as long as the service's TCP session with Zookeeper remained active. This seemed like a rather elegant solution to service presence. If the service died, the connection would be lost and the service listing would be dropped.

In the development of doozerd, the authors avoided this functionality, both for the sake of simplicity and that they believed it encouraged bad practice. The problem with relying on a TCP connection for service health is that it doesn't exactly mean the service is healthy. For example, if the TCP connection was going through a transparent proxy that accidentally kept the connection alive, the service could die and the ephemeral node may continue to exist.

Instead, they implemented values with an optional TTL. This allowed for the pattern of actively updating the value if the service was healthy. TTL semantics are also used in etcd, allowing the same active heartbeat pattern. Consul supports TTL as well, but primarily focuses on more robust liveness mechanisms. In the discovery layer I helped design for Flynn, our client library lets you register your service and it will automatically heartbeat for you behind the scenes.

This is generally effective for service presence, but it might not take the lesson to heart. Blake Mizerany, the co-author of doozerd and now maintainer of etcd, will stress the importance of meaningful liveness checks. In other words, there is no one-size-fits-all. Every service performs a different function and without testing that specific functionality, we don't actually know that it's working properly. Generic heartbeats can let us know if the process is running, but not that it's behaving correctly enough to safely accept connections.

Specialized health checks are exactly what monitoring systems give us, and Consul gives us a distributed monitoring system. Then it lets us choose if we want to want to associate a check with a service, while also supporting the simpler TTL heartbeat model as an alternative. Either way, if a service is detected as not healthy, it's hidden from queries for active services.

Built-in DNS Server

In my last post, I mentioned how DNS is not a sufficient technology for service discovery. I was very hesitant in accepting the value of a DNS interface to services in Consul. As I described before, all our environments are set up to use DNS for resolving names to IPs, not IPs with ports. So other than identifying the IPs of hosts in the cluster, the DNS interface at first glance seems to provide limited value, if any, for our concept of service discovery.

However, it does serve SRV records for services, and this is huge. Built-in DNS resolvers in our environments don't lookup SRV records, however, the library support to do SRV lookups ourselves is about as ubiquitous as HTTP. This took me a while to realize. It means we all have a client, even more lightweight than HTTP, and it's made specifically for looking up a service.

To me this makes SRV the best standard API for simple service discovery lookups. I hope more service discovery systems implement it.

In a later post in this series, we build on SRV records from Consul DNS to generically solve service inter-connections in Docker clusters. I don't think I would have realized any of this if Consul didn't provide a built-in DNS server.

Consul and the Ecosystem

Consul development is very active. In the past few months, they've had several significant releases, although it's still pre-1.0. Etcd is also actively being developed, though currently from the inside out, focusing on a re-design of their Raft implementation. The two projects are similar in many ways, but also very different. I hope they learn and influence each other, perhaps even share some code since they're both written in Go. At this point, though, Consul is ahead as a comprehensive service discovery primitive.

Unfortunately, Consul is much less popular in the Docker world. Perhaps this is just due to less of a focus on containers at HashiCorp, which is contrasted by the heavily container-oriented mindset of the etcd maintainers at CoreOS.

I've been trying hard to help bridge the Docker and Consul world by building a solid Consul container for Docker. I try to design containers to be self-contained, runtime-configurable appliances as much as possible. It was not hard to do this with Consul, which is now available on Github or Docker Hub.

Running Consul in Docker

Running a Consul node in Docker for a production cluster can be a bit tricky. This is due to the amount of configuration that the container itself needs for Consul to work. For example, here's how you might start one node using Docker (one command over several lines for readability):

$ docker run --name consul -h $HOSTNAME  \
    -p \
    -p \
    -p \
    -p \
    -p \
    -p \
    -p \
    -p \
    -d -v /mnt:/data \
    progrium/consul -server -advertise -join

The Consul container I built comes with a helper command letting you simply run:

$ $(docker run progrium/consul cmd:run -d -v /mnt:/data)

This is just a special command to generate a full Docker run command like the first one, hence wrapping it in a subshell. It's not required, but a helpful convenience to hopefully get people started with Consul in Docker much quicker.

One of the neat ways Consul and Docker can work together is by giving Consul as a DNS server to Docker. This transparently runs DNS resolution in containers through Consul. If you set this up at the Docker daemon level, you can also specify DNS search domains. That means the .services.consul can be dropped, allowing containers to resolve records with just the service name.

The project README has some pretty helpful getting started instructions as well as more detail on all these features. Here's a quick video showing how easy it is to get a Consul cluster up and running inside Docker, including the above DNS trick.


Once you have Consul running in Docker, you're close to having great service discovery, but as I mentioned in my last post, you're still missing those second two legs. Stay tuned for the next post on automatically registering containerized services with Consul.

Jul 29 2014

Understanding Modern Service Discovery with Docker

Over the next few posts, I'm going to be exploring the concepts of service discovery in modern service-oriented architectures, specifically around Docker. Many people aren't familiar with service discovery, so I have to start from the beginning. In this post I'm going to be explaining the problem and providing some historical context around solutions so far in this domain.

Ultimately, we're trying to get Docker containers to easily communicate across hosts. This is seen by some as one of the next big challenges in the Docker ecosystem. Some are waiting for software-defined networking (SDN) to come and save the day. I'm also excited by SDN, but I believe that well executed service discovery is the right answer today, and will continue to be useful in a world with cheap and easy software networking.

What is service discovery?

Service discovery tools manage how processes and services in a cluster can find and talk to one another. It involves a directory of services, registering services in that directory, and then being able to lookup and connect to services in that directory.

At its core, service discovery is about knowing when any process in the cluster is listening on a TCP or UDP port, and being able to look up and connect to that port by name.

Service discovery is a general idea, not specific to Docker, but is increasingly gaining mindshare in mainstream system architecture. Traditionally associated with zero-configuration networking, its more modern use can be summarized as facilitating connections to dynamic, sometimes ephemeral services.

This is particularly relevant today not just because of service-oriented architecture and microservices, but our increasingly dynamic compute environments to support these architectures. Already dynamic VM-based platforms like EC2 are slowly giving way to even more dynamic higher-level compute frameworks like Mesos. Docker is only contributing to this trend.

Name Resolution and DNS

You might think, "Looking up by name? Sounds like DNS." Yes, name resolution is a big part of service discovery, but DNS alone is insufficient for a number of reasons.

A key reason is that DNS was originally not optimized for closed systems with real-time changes in name resolution. You can get away with setting TTL's to 0 in a closed environment, but this also means you need to serve and manage your own internal DNS. What highly available DNS datastore will you use? What creates and destroys DNS records for your services? Are you prepared for the archaic world of DNS RFCs and server implementations?

Actually, one of the biggest drawbacks of DNS for service discovery is that DNS was designed for a world in which we used standard ports for our services. HTTP is on port 80, SSH is on port 22, and so on. In that world, all you need is the IP of the host for the service, which is what an A record gives you. Today, even with private NATs and in some cases with IPv6, our services will listen on completely non-standard, sometimes random ports. Especially with Docker, we have many applications running on the same host.

You may be familiar with SRV records, or "service" records, which were designed to address this problem by providing the port as well as the IP in query responses. At least in terms of a data model, this brings DNS closer to addressing modern service discovery.

Unfortunately, SRV records alone are basically dead on arrival. Have you ever used a library or API to create a socket connection that didn't ask for the port? Where do you tell it to do an SRV record lookup? You don't. You can't. It's too late. Either software explicitly supports SRV records, or DNS is effectively just a tool for resolving names to host IPs.

Despite all this, DNS is still a marvel of engineering, and even SRV records will be useful to us yet. But for all these reasons, on top of the demands of building distributed systems, most large tech companies went down a different path.

Rise of the Lock Service

In 2006, Google released a paper describing Chubby, their distributed lock service. It implemented distributed consensus based on Paxos to provide a consistent, partition-tolerant (CP in CAP theorem) key-value store that could be used for coordinating leader elections, resource locking, and reliable low-volume storage. They began to use this for internal name resolution instead of DNS.

Eventually, the paper inspired an open source equivalent of Chubby called Zookeeper that spun out of the Hadoop Apache project. This became the de facto standard lock server in the open source world, mainly because there were no alternatives with the same properties of high availability and reliability over performance. The Paxos consensus algorithm was also non-trivial to implement.

Zookeeper provides similar semantics as Chubby for coordinating distributed systems, and being a consistent and highly available key-value store makes it an ideal cluster configuration store and directory of services. It's become a dependency to many major projects that require distributed coordination, including Hadoop, Storm, Mesos, Kafka, and others. Not surprisingly, it's used in mostly other Apache projects, often deployed in larger tech companies. It is quite heavyweight and not terribly accessible to "everyday" developers.

About a year ago, a simpler alternative to the Paxos algorithm was published called Raft. This set the stage for a real Zookeeper alternative and, sure enough, etcd was soon introduced by CoreOS. Besides being based on a simpler consensus algorithm, etcd is overall simpler. It's written in Go and lets you use HTTP to interact with it. I was extremely excited by etcd and used it in the initial architecture for Flynn.

Today there's also Consul by Hashicorp, which builds on the ideas of etcd. I specifically explore Consul and lock servers more in my next post.

Service Discovery Solutions

Both Consul and etcd advertise themselves as service discovery solutions. Unfortunately, that's not entirely true. They're great service directories. But this is just part of a service discovery solution. So what's missing?

We're missing exactly how to get all our software, whether custom services or off-the-shelf software, to integrate with and use the service directory. This is particularly interesting to the Docker community, which ideally has portable solutions for anything that can run in a container.

A comprehensive solution to service discovery will have three legs:

  • A consistent (ideally), highly available service directory
  • A mechanism to register services and monitor service health
  • A mechanism to lookup and connect to services

We've got good technology for the first leg, but the remaining legs, despite how they sound, aren't exactly trivial. Especially when ideally you want them to be automatic and "non-invasive." In other words, they work with non-cooperating software, not designed for a service discovery system. Luckily, Docker has both increased the demand for these properties and makes them easier to solve.

In a world where you have lots of services coming and going across many hosts, service discovery is extremely valuable, if not necessary. Even in smaller systems, a solid service discovery system should reduce the effort in configuring and connecting services together to nearly nothing. Adding the responsibility of service discovery to configuration management tools, or using a centralized message queue for everything are all-to-common alternatives that we know just don't scale.

My goal with these posts is to help you understand and arrive at a good idea of what a service discovery system should actually encompass. The next few posts will take a deeper look at each of the above mentioned legs, touching on various approaches, and ultimately explaining what I ended up doing for my soon-to-be-released project, Consulate.