Dec 04 2015

Leadership, Guilt, and Pull Requests

I have a lot of open source projects. Even more with Glider Labs. Some of them are fairly popular. All of them get me excited. But most of them also bum me out. I'm going to share one of the reasons I've had to take a break for the past couple months, and why all my repositories are now looking for more maintainers.

Open source is hard. It seems easy, though. You just write a piece of software and put it on Github, right? Well that was the easy part. Now comes maintenance. And very likely politics. Inevitably, guilt. Multiply that by the number of open source projects you have and their popularity. End result: open source can be a bummer.

Jacob Thornton (@fat), co-author of Bootstrap, gave a talk a few years back echoing the sentiment of many open source authors and maintainers. He calls it Cute Puppy Syndrome. It's not the best analogy, but it gets the point across. Open source projects, like puppies, are great fun when they start. As they get older and more mature, responsibility seems to outweigh their cuteness. One solution is to put your old dog up for adoption and get a new puppy. As you can tell from his delivery, this analogy is intended to be humorous:

He mentions that many authors of popular open source projects have gotten burnt out and look for an exit. Often handing projects off to maintainers, sometimes never to return. Not to avoid responsibility, but to stay sane. Still, much of the time, that sense of responsibility lingers. As Jacob expands on the puppy analogy:

If you have your puppy and it turns into a dog, you put it up for adoption, you give it to a maintainer. And then he over feeds it and it becomes fat and bloated. And you just sit there and you're really sad because you don't really have time to take care of your puppy any more, but you don't want to see it fat and bloated. So you're just real sad all the time.

Alternatively, you can let issues and PRs pile up. Guilt and sadness either way. At least opening the project up lets it survive and continue to provide value to a larger audience. You just have to let go of the project as it will now evolve in ways you might not agree with.

When I did this with Dokku, the new maintainers did a great job at keeping the project and community healthy. I can't thank them enough for that. I had to let go quite a bit, but the project would probably be dead without them.

In fact, there's something interesting about maintainers that didn't author the project. It's probably different from person to person and project to project, but the maintainers of Dokku don't have the guilt or burden that I do. They're happy to help, and as volunteers don't feel like they owe anybody anything. It's really the ideal situation. Perhaps authors shouldn't be maintainers after a certain point.

That said, even with these great maintainers, Dokku really only kept on an incremental path of maintaining the status quo. That's not necessarily a bad thing, but it meant Dokku wasn't able to develop further in the directions I had originally intended. I thought to myself, well eventually I'll find time to do a system-wide refactoring to get it on this path I want and submit these as PRs like any other contributor. That time never came, and the project continued to fall behind from the evolving larger vision. The project I started was not living up to my own expectations for it.

Sadness. Guilt.

Then I did something different. It was so simple. I wrote a wiki page describing what I wanted and why I wanted it. For some reason it came as a surprise to me that the maintainers started moving the project in that direction! Did it happen exactly how I'd do it? Not always. But it still brought the project closer to what I wrote down.

This shouldn't have come as a surprise. In essence, this is leadership. There are different forms of leadership, but at the core is the idea of "saying what, not how". It can be very hard for programmers to get into this mindset because our medium is all about the how. Stepping back and writing what you want with flexibility towards how it's implemented takes practice.

This experiment with Dokku was far from perfect. In fact, that document is still incomplete. Project leadership is just as ongoing as maintenance. However, it's something worth getting better at. It's essential to authoring many open source projects and remaining happy enough to keep going. In the case of my projects, since there is always a bigger picture they fit into, it's even more important.

Dokku is just one of many projects, but Dokku is one of my only projects that I'm not an active maintainer. Dokku isn't why I had to take a break, it was all the others.

Some of you might have seen my ramblings about Megalith. Some of you might even be able to follow them enough to see that most of my open source projects are all basically part of Megalith. Or that Megalith is basically all my projects. You can probably see how this leadership is critical to sustain all these projects while keeping them moving in roughly the same direction.

I don't write open source software to make money. In fact, even solving a particular problem is secondary to working towards a vision of how the world should be. Since that's really what's important to me, I should be spending my time on being an effective leader. At the very least, documenting what I want, the direction, why it's important, what design principles are involved, preferred architectural patterns, and so on. Then helping people understand, integrating their feedback, and letting go of a lot of the details.

To support this, I need to open up our projects to more maintainers. Going forward, I'll be trying a variation of the Pull Request Hack to get more people involved across all projects. If you submit a solid substantial PR or several solid minor PRs to any Glider Labs project, you'll be invited to have commit access across all projects.

Starting now, all public projects under my username or Glider Labs have an open call for maintainers. If you'd like to volunteer to help maintain any of these projects, just join our Slack and in #intros say you're interested in becoming a maintainer.

From there I'll do my best to provide guidance and leadership. Together we'll keep making great things!

Comments

Oct 05 2015

The Next 10 Years: Megalith

I've decided what I'm going to be working on for the next 10 years. It's epic and exciting, and I'm going to need your help. It's called Megalith.

Megalith is a symbol of the ideal I've been working towards my entire career. It's constantly evolving and very nuanced. I'm going to be upfront and say that I'm not going to be able to fully explain what Megalith is in this post. Instead, I'm going to start setting up context. To me, context is everything.

For the past few years I've been spending most of my working hours writing open source software related to a project I worked on in 2012 called Docker. Docker was created as a skunkworks collaboration between me and some talented engineers at dotCloud, now Docker. The company pivoted 100% to Docker and is now worth about a billion dollars. As an independent, I didn't stay with Docker, I moved on to the next problem. Docker was one piece of a grander vision.

To help pay for this lifestyle, I've experimented with sponsorships and even fell into lucky situations. Last year I tried to make it a little more sustainable by starting Glider Labs with a friend. We focused on consulting around Docker. We helped some of the first people to actually run Docker in production. We did this to learn and get a better grasp on the Real Problems. Something many vendors in this space don't really do. We learned a lot and as a result we ended up making a lot more open source software.

The problem is that there is a lot more to make. This ideal I have in mind that's been developing in my head for over 5 years is a massive undertaking. I've realized if I'm going to keep pursuing it, I need two things: a better vehicle for the work, and a unifying project to get help around it.

Building a better organization for this work

The software I write that people love comes from a compulsive drive that goes beyond and even against the idea of startups. With the exception of Docker and a few other collaborations, I've never made anything that people loved while working for a startup. Naming and evangelizing webhooks was not something anybody paid me to do. In fact, a lot of projects I've built or think should exist are too small to sustain a startup. Does that mean they shouldn't be built? Or that I should temporarily dedicate my life to maybe make one of them work as a startup?

Even a lifestyle business is quite a commitment to make work. My friend Alan Shreve made Ngrok, inspired by my tool Localtunnel. It's free and open source, but he's also bootstrapped a business out of it. This business is what he spends most his working hours on.

Given my goals and values I do prefer this approach, but it still poses a problem. The time spent writing lines of code to support a business, the time spent figuring out market fit, the time spent on support and operations … this is time not moving forward to me. It's extracting wealth out of something that already exists.

Why do this? So Alan can sustain himself and potentially fund other projects, right? In the meantime, I know for a fact that there's a lot of great open source software that he's not making.

His goal is passive income. For a lot of us independents, that's the dream. It may or may not realize in full, but it's certainly time consuming either way. In that way, it's sort of just a smaller variation of the startup lottery.

Meanwhile, in the same time, I've put out dozens of open source projects that solve problems or work towards dissolving larger problems in the long-term. I actually can't help it. It's compulsive like I said. The only way I see it stopping is if I leave the space altogether. I don't get paid to do 90% of these projects. They help bring me contract work, but seemingly only to take time away from supporting and building a community around those projects.

The other problem, for me, is that running a business causes you to make software differently. You think about building software you can sell, or that supports what you can sell. More than doing one thing well, you think about the features people will pay for. More than making it simple, you think about obscure enterprise and legacy use cases. Conventional knowledge says you must do those in some way at some point because that's How It Works.

The problem is worse for startups that take VC money. Even VCs that "get it" and let you focus on open source traction still expect you to eventually figure out how to monetize and make them millions. To varying degrees this often makes startups:

  • focus on enterprise customers, not regular developers
  • ship software that solves short-term problems, or yesterday's problems
  • prioritize sexy demoware without production hardening
  • increase perceived value with more hires and more partnerships
  • de-prioritize any effort outside the product that makes money

Hashicorp is one of the best examples of companies in this space that have done a good job at taking just enough VC and working against a lot of these forces. However, they still work within the framework. They still have a commitment to exit big someday. The implications of this are not insignificant.

I'm much more likely to bootstrap a company like Alan than take VC money. Not only is it just more my style, but a VC startup just won't play to my strengths. Though, building a bootstrap business doesn't seem to produce the most value for my time either. Or make me very happy.

I'd much rather find a new way. Not just because I want to play to my strengths, but because I know I'm not the only one this applies to. I also know that a different, better kind of open source software will result if done properly.

What I want is something of an independent R&D lab. I want us to re-capture the innovation and invention of Xerox PARC and Bell Labs, but focusing on open source. I want us to have the freedom to explore and build software right with like-minded people. Not to get rich, but to slow cook software. Systems software that further empowers individuals and small groups … enterprise customers of the future, not the past.

This is what I want Glider Labs to transition into. In fact, it's already been operating like this in a way. And I've been exploring and learning ways to make this work for years now. It's part business, part cooperative, part public service. But to make the leap to a lab that supports more than myself, it won't happen over night. And I can't do it by myself.

Sharing the vision, enabling participation

This isn't just about a new organization. It has to have some purpose, some initial unifying project. In order to start from nothing, there needs to be a clear mission of value. Not just boundless experimentation. Luckily, most of my work does fall under a certain theme driven by a nebulous but nonetheless motivating ideal. That seems like a good place to start.

I've been told if I just wrote down everything I want to build and why, people might be willing to help out. This is challenging both because of scope and its constant evolution. I figured if I just keep making projects people will start to see it, but other than a few people I'm not sure that's working out. So I'm going to try a more top-down approach.

The real project this post is about is a meta-project I'm calling Megalith. It's an umbrella project to help unify and bring a common goal to all the work I've been doing for the past 10 years, and over the next 10 years.

I know I can't do it alone, so the project is designed for participation. It will involve many more specific projects that are open source and independently useful. Many already exist. Most do not.

Whether or not the final ideal is achieved, it will be approached. Lots of value will be produced in the process. Not just software and contributions to existing open source, but guides and how-to knowledge of everything I've learned to lead me to my current conclusions, and everything we learn in the process.

Glider Labs and Megalith are separate but related parts of this venture. Megalith is the meta-project, Glider Labs is the organization. The idea is that they support each other. Megalith makes this new Glider Labs a reality, Glider Labs makes Megalith a reality.

Relevant to your interests?

The first step is to explain Megalith and try to communicate this idea in my head, or at least some manifestation of it. Then everything else will start to make sense. It's almost more about approach and values. It's about an idea of simple, composable, extensible tools to make modern end-to-end development and operations sane at both large and small scale. And making the world more programmable…

Anyway, it's more than I can get into here. I've set up an announcement mailing list you can subscribe to. Sign up and you'll get emails about what's next. I might even email you directly to say hi.

Feel free to get in touch with me, leave a comment below, or help out by sharing this post if it resonates with you. I'm pretty excited, especially since a lot of people have expressed interest so far.

Lastly, here's a silly video I made about it:


Subscribe for updates!

Comments
Oct 28 2014

Deis Breathes New Life into Dokku

Today I'm excited to announce that Dokku is now sponsored by my friends of the Deis project. This means that OpDemand, the company behind Deis, will be funding part-time development of Dokku and its components.

Remember Dokku?

A little over a year ago, I announced Dokku as an open source "Docker powered mini-Heroku." It quickly became the first killer application for Docker. Designed to be simple and hackable, Dokku enables web developers to run their own single-host PaaS that's directly compatible with Heroku.

As the project took off, I went on to tackle the challenges of a multi-host PaaS with the Flynn team. Even without me, the Dokku community continued to grow, thanks to the help of new maintainers and contributors. The experimental plugin system allowed all sorts of customizations and extensions of Dokku to flourish.

Over time, though, the wonderful volunteer maintainers of the project started to get burnt out. Handling issues across a dozen language runtimes and even more plugins is taxing. Many were upstream buildpack or Docker issues, or larger inherent problems of the project requiring stronger leadership to resolve.

Although Dokku is still used and loved today, without active maintainership and leadership, it was at risk of "bit rot". I came to the conclusion that it was in need of some love from the original author. Luckily, the Deis team was willing to help make this happen and is effectively saving the project from a slow death.

About Deis

Not long after I started collaborating with the Flynn team, another project called Deis came onto the scene. Both projects have the goal of being enterprise grade, multi-host PaaS solutions. Although technically competitive, as open source projects composed of great people, we openly share information and components. As an independent agent, I try to bridge silos and facilitate that kind of sharing and communication. I'd gone out to visit both teams to collaborate, talk shop, and have fun.

I eventually moved on from Flynn and started independently exploring distributed systems components in a Docker world. Deis continued to adopt and support many of my open source components. They always kept an open dialog with me and others in the Docker community. When I mentioned my plans to reinvigorate Dokku, they were quick to offer help.

The Sponsorship

The timing for this sponsorship is perfect. Deis now requires at least 3 hosts in a cluster, making Dokku the obvious recommendation for smaller deployments. The projects will focus on shared components even more. This sponsorship will also ensure a smooth migration to Deis if a Dokku user wants to go down that path.

What is Dokku expected to get? First, time and thought put into getting the project modernized and on path for a solid 1.0 release. Among other things, this involves redesigning aspects of the project to make it more sustainable as an open source project.

Much of the lessons of Flynn and Deis, as well as reflections on Dokku itself, will feed back into Dokku. My plan is to:

  • make it more robust and testable
  • improve code quality and standards
  • properly direct upstream issues upstream
  • improve documentation and basic support processes
  • add popular features, such as addons and Dockerfile build support

And if you can believe it, I plan to make it more modular and even simpler.

Yay, Dokku!

Along with Deis, I want to thank all the contributors and maintainers involved in Dokku. I especially want to thank asm89, rhy-jot, plieter, fcoury, and josegonzalez. The project would already be dead without them. If you want to get involved, I'll generally be in the #dokku channel on Freenode sharing updates as I progress. Most of my work will be in a new branch, but first it will take place in creating and updating components used by Dokku.

I'm only able to put a day or so of hours a week into the project, but steady, consistent effort and help from the community will ensure Dokku will be around for a long time!

Comments
Sep 10 2014

Automatic Docker Service Announcement with Registrator

No matter which service discovery system you use, it will not likely know how to register your services for you. Service discovery requires your services to somehow announce themselves to the service directory. This is not as trivial as it sounds. There are many approaches to do this, each with their own pros and cons.

In an ideal world, you wouldn't have to do anything special. With Docker, we can actually arrange this with a component I've made called Registrator.

Before I get to Registrator, let's understand what it means to register a service and see what kind of approaches are out there for registering or announcing services. It might also be a good idea to see my last posts on Consul and on service discovery in general.

Service Registration Data Model

Service registration involves a few different pieces of information that describes a service. At the very least, it will involve a service name, such as "web", and a locating IP and port. Often, there is a unique ID for a service instance ("web.2"). Some systems generate this automatically.

Around this, there might be extra information or metadata associated with a service. In some systems this could be key-value attributes. Or maybe just tags. Classic service discovery of the zero-configuration world would also include the protocol (HTTP, SMTP, Jabber, etc), but this isn't very useful information since in our case we already know the protocol of the service we're looking for.

When using etcd or Zookeeper it's up to you how your service directory works, both what information is stored and how to structure it. Specialized service discovery systems like Flynn's discoverd or Netflix's Eureka provide more structure around service semantics. Consul is sort of a hybrid, since it's really a specialized service discovery system built-in to a general configuration store.

Consul lets you define a service name, IP, port, optional service ID, and optional tags. In a future release, I believe it will tie in more with the key-value store to allow you to have arbitrary attributes associated with a service. Right now, Consul also lets you define a health check to use with its monitoring system, which is unique to Consul.

So far, that gives you an idea of the data involved in registering a single service, but that's not the complete model. A service "record" is a reference to an actual service, and it's important to understand what that actually is. Whether using containers or not, a service will always boil down to a long-running process, and a process may listen on several ports. This could imply multiple services.

One could argue that if a process listens on multiple ports for the same functional service, it might be a good idea to collapse it into a single service. Modeling it in this way ends up being either complicated (putting the other service ports in meta-data), or incomplete ("which port do I use for TLS?"). I've found it's simplest to just model each port a process listens on as a separate service, using the name to logically group them. For example, "webapp-http" and "webapp-https".

Registering In-process or Using a Coprocess

The most common strategy to register in service discovery is actually directly self-registering from the service process itself. From a "good solution" perspective, this might seem terrible. But it's common for a reason. Mostly, it's pragmatic, as many organizations build their specific services around their specific service discovery system. However, it does have other advantages.

Service discovery systems like Eureka and discoverd provide a library that can be used in your service to register itself, as well as lookup and discover other services from in-process. This provides opportunities like having balancing and connection pooling logic taken care of for you, without the extra hop of a reverse proxy. And in cases where heartbeats are used for liveness, the library can handle heartbeating for you.

The disadvantage of this approach as a reusable system is that libraries are hard provide across languages, so there might be limited language support for the library. Depending on how complex the library is, it may also be difficult to port for people that want to make the effort to expand language support.

Though, the biggest disadvantage is putting the responsibility on the service in the first place. This creates two problems. First, if you intend to make your services useful to anybody else, your service will be less portable across environments that use different discovery mechanisms. Netflix open source projects suffer from this, as people already complain it's too hard to use some of their components without using all of them. Second, third-party components and services like Nginx, Memcached, or pretty much any datastore will not register themselves.

While some software might provide hooks or extensions to integrate with your service discovery, this is pretty rare. And patching is not a scalable solution. Instead, the common solution for third-party services is to put the registering responsibility near the service.

If you're not directly registering in-process, the second most common approach is running another parallel process to register the service. This works best with a process manager like systemd that can ensure if the service starts, so does the paired registering service.

Some call this technique using a coprocess or a "sidekick". When working with containers, I usually use coprocess in reference to another process in the same container. A sidekick would be a separate container and process. Either way, this is a useful pattern even beyond service registration. I use it for other administrative services that support the main service, for example to re-configure the service. The open source PaaS Deis used this pattern for shipping out a service's logs. However, it seems to simplify they're moving to my tool logspout.

A variation of using a coprocess is process "wrapping", where you use a launcher process that will register and run the service as a child process. Flynn does this with sdutil. Some might say it can make starting services feel very complicated since you now have to configure the service as usual, on top of providing registration details to the launcher. At the end of the day, this is effectively the coprocess model launched with one command instead of two.

The Problem with a Coprocess for Registering

In whatever form it comes, a coprocess comes with two challenges: configuration and manageability.

With a parallel announcing process, you need to tell it what service or services it should announce, providing it all the information we talked about before. An interesting problem with any external registration solution is where that service description is stored. For example, if you were doing announcement in-process, it would at least already know what ports it exposes. However, it most likely wouldn't know what the operator wants to call it. Some systems will roll all this information up into higher-level system constructs, like "service groups" or some unit of orchestration. I prefer not to couple service discovery with orchestration. Instead, I'd rather service semantics live as close to the service process as possible.

A coprocess or sidekick for registering also means you'll have one for every service you start. There is no technical problem with this, but it introduces operational complexity. A system has to manage this, whether it's a process manager like systemd or full-on orchestration. That system likely has to be configured, adding more configuration, which may or may not be the right place to define the service. And now you need to be sure to always use this system to launch any service, since running a service by hand will not register the service.

In an ideal world, we don't worry about any of this. We just run a service and its ports somehow get registered as services. If we want to specify more details about the service, we can do this in a way that's packaged as close to the service as possible. And of course, we want an operator and automation friendly way to set or override that service definition at runtime.

How Docker Helps Achieve the Ideal

Running services in Docker provides a number of benefits, and those who believe Docker is just about container isolation clearly miss the point. Docker defines a standard unit of software that can have anything in it and yet have a standard interface of operations. This interface works with a runtime that gives you certain capabilities in managing and operating that unit of software. These capabilities and this common container model happen to have everything we need to automatically register services for any software.

The Docker container image includes default environment variables, which can be defined by the Dockerfile. This turns out to be the perfect place to describe the service it contains. The container author has the option to use the environment variables to include their idea of how the service should be described and registered, which will be shipped with the container wherever it goes. The operator can then set runtime environment variables to further define or redefine their own description of the service.

The Docker runtime makes these values easy to inspect programmatically. The runtime also produces events when a container starts or stops, which is generally when you want to register or deregister the services of the container.

All this together lets us provide automatic service registration for any Docker container using a little appliance I've made called Registrator.

Introducing Registrator

Registrator is a single, host-level service you run as a Docker container. It watches for new containers, inspects them for service information, and registers them with a service registry. It also deregisters them when the container dies. It has a pluggable registry system, meaning it can work with a number of service discovery systems. Currently it supports Consul and etcd.

There are a few neat properties of Registrator:

First, it's automatic. You don't have to do anything special other than have Registrator running and attached to a service registry. Any public port published is registered as a service.

Related but fairly significant, it requires no cooperation from inside the container to register services. If no service description is included and the operator doesn't specify any at runtime, it uses Docker container introspection for good defaults.

Next, it uses environment variables as generic metadata to define the services. Some people have asked how you can add metadata to Docker containers, but the answer is right in front of them. As mentioned this comes with the benefit of being able to define them during container authorship, as well as at runtime.

Lastly, the metadata Registrator uses could become a common interface for automatic service registration beyond Registrator and even beyond Docker. Environment variables are a portable metadata system and Registrator defines a very data-driven way to define services. That same data could be used by any other system.

In terms of previous work, Michael Crosby's project Skydock was a big inspiration on the direction of Registrator, so it might be worth looking into for reference. Registrator is a little more generic and made specifically for distributed systems, not as much for single host registries. For example, Registrator focuses on published ports and uses a host-level IP as opposed to local container IPs. For people interested in single-host discovery, Registrator has already inspired compatible alternatives, including Brian Lalor's docker-hosts.

In any case, I believe I've made the first general purpose solution to automatic service registration. Here's a video demo:

Onward…

In retrospect, the problem we've solved here now seems very trivial, but we've never had this before. Like many good designs, it can take a while for all the pieces to come together and make sense in one's mind before it becomes obvious. Once it's obvious, it seems like it always was.

Combining auto-registration with a good service directory, you're almost to an ideal service discovery system. That last problem is about the other side of discovery: connecting to registered services. The next post will describe how this is also not as trivial as it sounds, and as usual, I will offer an open source solution.

Comments
Aug 20 2014

Consul Service Discovery with Docker

Consul is a powerful tool for building distributed systems. There are a handful of alternatives in this space, but Consul is the only one that really tries to provide a comprehensive solution for service discovery. As my last post points out, service discovery is a little more than what Consul can provide us, but it is probably the biggest piece of the puzzle.

Understanding Consul and the "Config Store"

The heart of Consul is a particular class of distributed datastore with properties that make it ideal for cluster configuration and coordination. Some call them lock servers, but I call them "config stores" since it more accurately reflects their key-value abstraction and common use for shared configuration.

The father of config stores is Google's Chubby, which was never made publicly available but is described in the influential Chubby paper. In the open source world we have Apache Zookeeper, the mostly defunct doozerd, and in the last year, etcd and Consul.

These specialized datastores are defined by their use of a consensus algorithm requiring a quorum for writes and generally exposing a simple key-value store. This key-value store is highly available, fault-tolerant, and maintains strong consistency guarantees. This can be contrasted with a number of alternative clustering approaches like master-slave or two-phase commit, all with their own benefits, drawbacks, and nuances.

You can learn more about the challenges of designing stateful distributed systems with the online book, Distributed systems for fun and profit. This image from the book summarizes where the quorum approach stands compared to others:

Quorum datastores such as our config stores seem to have many ideal properties except for performance. As a result, they're generally used as low-throughput coordinators for the rest of the system. You don't use them as your application database, but you might use them to coordinate replacing a failed database master.

Another common property of config stores is they all have mechanisms to watch for key-value changes in real-time. This feature is central in enabling use-cases such as electing masters, resource locking, and service presence.

Along comes Consul

Since Zookeeper came out, the subsequent config stores have been trying to simplify. Both in terms of user interface, ease of operation, and implementation of the consensus algorithms. However, they're all based on this very expressive, but lowest common denominator abstraction of a key-value store.

Consul is the first to build on top of this abstraction by also providing specific APIs around the semantics of common config store functions, namely service discovery and locking. It also does it in a way that's very thoughtful about those particular domains.

For example, a directory of services without service health is actually not a very useful one. This is why Consul also provides monitoring capabilities. Consul monitoring is comparable, and even compatible, with Nagios health checks. What's more, Consul's agent model makes it more scalable than centralized monitoring systems like Nagios.

A good way to think of Consul is broken into 3 layers. The middle layer is the actual config store, which is not that different from etcd or Zookeeper. The layers above and below are pretty unique to Consul.

Before Consul, HashiCorp developed a host node coordinator called Serf. It uses an efficient gossip protocol to connect a set of hosts into a cluster. The cluster is aware of its members and shares an event bus. This is primarily used to know when hosts come and go from the cluster, such as during a host failure. But in Serf the event bus was also exposed for custom events to trigger user actions on the hosts.

Consul leverages Serf as a foundational layer to help maintain its cluster. For the most part, it's more of an implementation detail. However, I believe in an upcoming version of Consul, the Serf event bus will also be exposed in the Consul API.

The key-value store in Consul is very similar to etcd. It shares the same semantics and basic HTTP API, but differs in subtle ways. For example, the API for reading values lets you optionally pick a consistency mode. This is great not just because it gives users a choice, but it documents the realities of different consistency levels. This transparency educates the user about the nuances of Consul's replication model.

On top of the key-value store are some other great features and APIs, including locks and leader election, which are pretty standard for what people originally called lock servers. Consul is also datacenter aware, so if you're running multiple clusters, it will let you federate clusters. Nothing complicated, but it's great to have built-in since spanning multiple datacenters is very common today.

However, the killer feature of Consul is its service catalog. Instead of using the key-value store to arbitrarily model your service directory as you would with etcd or Zookeeper, Consul exposes a specific API for managing services. Explicitly modeling services allows it to provide more value in two main ways: monitoring and DNS.

Built-in Monitoring System

Monitoring is normally discussed independent of service discovery, but it turns out to be highly related. Over the years, we've gotten better at understanding the importance of monitoring service health in relation to service discovery.

With Zookeeper, a common pattern for service presence, or liveness, was to have the service register an "ephemeral node" value announcing its address. As an ephemeral node, the value would exist as long as the service's TCP session with Zookeeper remained active. This seemed like a rather elegant solution to service presence. If the service died, the connection would be lost and the service listing would be dropped.

In the development of doozerd, the authors avoided this functionality, both for the sake of simplicity and that they believed it encouraged bad practice. The problem with relying on a TCP connection for service health is that it doesn't exactly mean the service is healthy. For example, if the TCP connection was going through a transparent proxy that accidentally kept the connection alive, the service could die and the ephemeral node may continue to exist.

Instead, they implemented values with an optional TTL. This allowed for the pattern of actively updating the value if the service was healthy. TTL semantics are also used in etcd, allowing the same active heartbeat pattern. Consul supports TTL as well, but primarily focuses on more robust liveness mechanisms. In the discovery layer I helped design for Flynn, our client library lets you register your service and it will automatically heartbeat for you behind the scenes.

This is generally effective for service presence, but it might not take the lesson to heart. Blake Mizerany, the co-author of doozerd and now maintainer of etcd, will stress the importance of meaningful liveness checks. In other words, there is no one-size-fits-all. Every service performs a different function and without testing that specific functionality, we don't actually know that it's working properly. Generic heartbeats can let us know if the process is running, but not that it's behaving correctly enough to safely accept connections.

Specialized health checks are exactly what monitoring systems give us, and Consul gives us a distributed monitoring system. Then it lets us choose if we want to want to associate a check with a service, while also supporting the simpler TTL heartbeat model as an alternative. Either way, if a service is detected as not healthy, it's hidden from queries for active services.

Built-in DNS Server

In my last post, I mentioned how DNS is not a sufficient technology for service discovery. I was very hesitant in accepting the value of a DNS interface to services in Consul. As I described before, all our environments are set up to use DNS for resolving names to IPs, not IPs with ports. So other than identifying the IPs of hosts in the cluster, the DNS interface at first glance seems to provide limited value, if any, for our concept of service discovery.

However, it does serve SRV records for services, and this is huge. Built-in DNS resolvers in our environments don't lookup SRV records, however, the library support to do SRV lookups ourselves is about as ubiquitous as HTTP. This took me a while to realize. It means we all have a client, even more lightweight than HTTP, and it's made specifically for looking up a service.

To me this makes SRV the best standard API for simple service discovery lookups. I hope more service discovery systems implement it.

In a later post in this series, we build on SRV records from Consul DNS to generically solve service inter-connections in Docker clusters. I don't think I would have realized any of this if Consul didn't provide a built-in DNS server.

Consul and the Ecosystem

Consul development is very active. In the past few months, they've had several significant releases, although it's still pre-1.0. Etcd is also actively being developed, though currently from the inside out, focusing on a re-design of their Raft implementation. The two projects are similar in many ways, but also very different. I hope they learn and influence each other, perhaps even share some code since they're both written in Go. At this point, though, Consul is ahead as a comprehensive service discovery primitive.

Unfortunately, Consul is much less popular in the Docker world. Perhaps this is just due to less of a focus on containers at HashiCorp, which is contrasted by the heavily container-oriented mindset of the etcd maintainers at CoreOS.

I've been trying hard to help bridge the Docker and Consul world by building a solid Consul container for Docker. I try to design containers to be self-contained, runtime-configurable appliances as much as possible. It was not hard to do this with Consul, which is now available on Github or Docker Hub.

Running Consul in Docker

Running a Consul node in Docker for a production cluster can be a bit tricky. This is due to the amount of configuration that the container itself needs for Consul to work. For example, here's how you might start one node using Docker (one command over several lines for readability):

$ docker run --name consul -h $HOSTNAME  \
    -p 10.0.1.1:8300:8300 \
    -p 10.0.1.1:8301:8301 \
    -p 10.0.1.1:8301:8301/udp \
    -p 10.0.1.1:8302:8302 \
    -p 10.0.1.1:8302:8302/udp \
    -p 10.0.1.1:8400:8400 \
    -p 10.0.1.1:8500:8500 \
    -p 172.17.42.1:53:53/udp \
    -d -v /mnt:/data \
    progrium/consul -server -advertise 10.0.1.1 -join 10.0.1.2

The Consul container I built comes with a helper command letting you simply run:

$ $(docker run progrium/consul cmd:run 10.0.1.1::10.0.1.2 -d -v /mnt:/data)

This is just a special command to generate a full Docker run command like the first one, hence wrapping it in a subshell. It's not required, but a helpful convenience to hopefully get people started with Consul in Docker much quicker.

One of the neat ways Consul and Docker can work together is by giving Consul as a DNS server to Docker. This transparently runs DNS resolution in containers through Consul. If you set this up at the Docker daemon level, you can also specify DNS search domains. That means the .services.consul can be dropped, allowing containers to resolve records with just the service name.

The project README has some pretty helpful getting started instructions as well as more detail on all these features. Here's a quick video showing how easy it is to get a Consul cluster up and running inside Docker, including the above DNS trick.

Onward…

Once you have Consul running in Docker, you're close to having great service discovery, but as I mentioned in my last post, you're still missing those second two legs. Stay tuned for the next post on automatically registering containerized services with Consul.

Comments