NOTES: DOCKER NETWORKING

I've been having a lot of fun recently with Docker containers, from packaging and running my own Python scripts, to building the Pocket Internet proof of concept at the recent RIPE Hackathon and, finally, designing a solution for integrating a multi-datacentre, multi-environment Docker Swarm with a Cisco ACI fabric and the rest of the network for one of my customers. Below you will find my notes accumulated from going through official documentation, blog posts and experimentation in the lab.

Docker networks

When installed, docker creates 3 networks by default (and you can only change one of them):

  • bridge this is the docker0 network and where containers are attached by default
    • it masquerades to the outside world (PAT)
    • it allows containers to talk to each other on the same host
    • there's no built-in service discovery or name resolution
    • can be disabled/reconfigured
  • none if you attach a container to this, there's no network access and it has only the loopback interface
  • host allows the container to be attached to the host's network stack directly

User-defined networks:

  • bridge
    • functionality as above, but can also be set up as an isolated internal network (no NAT, no gateway)
    • you can expose (proxy) ports to make certain containers visible from the outside
  • docker_gwbridge
    • used in swarms (see below) or created on demand if there's no other bridge network that provides external connectivity to containers
  • overlay used in swarms (see below)
  • macvlan
    • container interfaces are attached directly to a docker host (sub)interface
    • supports dot1q tagging (automatically created subinterfaces or manually attached)
    • each container gets its own unique MAC and IP address
    • no port mapping needed to expose services as containers are directly on the network (both good and bad!)
    • you may have to enable promiscuous mode on the parent interface
    • you will have to disable security features in the vSwitch if using VMs (multiple MACs behind the same adapter)
    • suffers from limitations in swarm mode:
      • service discovery and service name load balancing are docker-host local only (you will need an external service to manage it, docker itself won't be able to)
      • IP subnets need to be split into ranges allocated to each docker host in the swarm to prevent overlapping
      • can be made to work for network connectivity in a swarm as detailed here
  • ipvlan
    • L2 mode functions similarly to macvlan but allocates the same MAC (from the parent host interface) to all containers
      • sharing the same MAC address will create problems with DHCP and SLAAC
    • L3 mode routes the packets between the parent interface and the subinterfaces (different subnets!)
      • it requires static routes (or a routing protocol in the container), but it does not decrement TTL and it does not support multicast
  • custom network plugin - this can be a 3rd party module that uses the Docker API

Other notes:

  • Docker provides an embedded DNS server that resolves container names to IPs on the same network
  • You can publish a port which instructs the daemon to map a container port to a free (high-order) port on the host machine for external connections
  • Docker uses the system's iptables to perform operations on the host (routing, port forwarding, NAT etc.)

Docker Swarm

  • A swarm groups together a bunch of docker hosts and facilitates starting containers across the pool and preserving local-like network connectivity within services
  • 2 types of traffic: control/management plane and application data plane
  • 3 networks
    • overlay (driver)
      • they facilitate communication between docker hosts (daemons) within the swarm
      • you can attach services to one or more overlays
      • the linux namespace created has static ARP entries for each running container and the interface acts as a proxy for ARP queries
      • each overlay has a VXLAN id allocated to it (used for encapsulation)
    • ingress (special overlay)
      • facilitates load-balancing between a service's nodes
      • requires published ports
      • when traffic arrives on the port, an IP from the list of available nodes is selected and traffic sent to it via the ingress overlay (!)
    • docker_gwbridge (bridge driver)
      • connects the overlays to an individual docker host's physical network
      • can't be used for inter-container-communication
      • masquerades (NAT) to the outside world
  • Communication between docker hosts (daemons):
    • 7946 TCP/UDP for container network discovery
    • 4789 UDP for the container overlay network
    • swarm nodes exchange control plane data encrypted - AES-GCM
    • you can enable per-overlay encryption with a flag -> IPSEC tunnels between all nodes where tasks are scheduled for services that are attached to that overlay network (so partial mesh on-demand tunnels)
  • MTU considerations
    • minimum extra VXLAN encapsulation
    • optional IPSEC encapsulation
    • overlay interfaces automatically adjust by lowering mtu (1450 for non-encrypted)

References

More, more, more

Cumulus wrote a very nice article on 5 ways to design your container network which lists options for connecting Docker Swarms to a fully routed DC fabric (with links to detailed, public(!), validated design documents). If you are able to push routing all the way to the Docker host these designs look very straightforward and I might have to dig a bit deeper into the design docs to understand what security options are available when you can't just let the services roam freely (or push policy all the way into the container).

Doing something similar is a project called bagpipe-bgp, which has now become part of OpenStack. A blog post about experimenting with it lives here.

Kubernetes

Kubernetes is all the rage now as the more powerful (tradeoff - steep learning curve, more complicated to run), more scalable orchestrator option to Docker Swarm, so it's worth a mention, but mostly out of scope for this article because:

And, as always, thanks for reading.


comments powered by Disqus