Bypassing docker network isolation (hack)

I am the author of the netdata docker image. I created the image before netdata became the popular real time performance monitoring software which it is today - over 15 thousand stars on GitHub! Wow! Due to the network isolation however, you would have to run it in --net=host mode to monitor your network devices.

The problem with network isolation

A positive feature which comes from network isolation is that the attack surface becomes very small, if somebody manages to exploit a weakness in a running container. For example, if netdata was running on the host network, and someone would manage to exploit it - they would have the capability of joining botnets and disrupting operations of other people, but in the more destructive cases - they can literally turn off your network interfaces with ifconfig [interface] down. I don’t know, do we still have passwords? I’ve used a SSH key to log into every server for at least a decade now and I’d be hard pressed to guess what kind of password I set on any local user.

I guess in part it’s the point of single responsibility that makes me avoid putting netdata on the host network. It’s meant to collect what it can from /sys and /proc filesystems and with read-only access on those provide real-time insight into how your system is operating.

Network monitoring with netdata

How the proc filesystem works

In effect, the proc filesystem is not an actual file system - it’s bits and bytes are not stored anywhere but are retrieved from internal linux kernel structures whenever somebody opens a file. So when you open and read /proc/net/dev, what happens is that you’re really interfacing with the kernel, which returns the data which was collected at that point. So, when netdata reads from the /proc filesystem, it utilizes the kernel to retrieve data from any allowed endpoints.

So, plainly put, we can read which processes are running on the host with SYS_PTRACE and traversing /proc/{pid} structures, but when netdata tries to read anything under /proc/net/* the read will be rejected due to network isolation. It’s either a “you can do nothing” or “you can do everything” type of world in this sense.

Workaround test 1

My first test with providing netdata with /proc/net contents was optimistic. If you will notice, when you list files under the proc filesystem, they are reported with a size 0, and when you actually read them you will get data which is most likely longer then 0 bytes.

My first idea was just to run rsync on the /proc/net location and copy the files into another folder. It works on paper, but, it actually doesn’t work. You get a folder with bunches of files which are size 0. You should actually read a file, but rsync I suppose just sees “oh it’s size zero, i will read 0 bytes”.

I didn’t actually see what rsync does, but either way, I did guess it would be prohibitive from a system viewpoint to run exec to start up a new rsync process several times per second.

Workaround test 2

Obviously, you can do cat /proc/net/dev > /fakenet/dev, as long as you create the fakenet folder beforehand. We can find out what files are under /proc/net with find -type f, and then traverse them and cat/pipe them into the new fake location.

Well, I did try it but there’s still the same problem as with rsync - any kind of fake proc filesystem copy would be load heavy, spawning a process multiple times per second.

Final full-bash solution

I created the final bash script that does a few things to optimize speed. Here are all the tricks explained;

OUTPUT="/dev/shm/fakenet/";

We are writing to /dev/shm/fakenet location to optimize for input/output. SHM stands for shared memory, which is where the contents are kept. Every time we’re writing a file there, we’re using only RAM.

SOURCES="/proc/net/dev
...

I list individual sources, sure, to avoid an exec call for find, but also because the list of files which netdata monitors is known from it’s source code. Because we know which files netdata expects, we can only copy those files instead of the complete /proc/net location and subfolders. Yay for using only what we need.

OUTFILE="${NETFILE:10}"
echo "$(<$NETFILE)" > $OUTPUT$OUTFILE

These are the two lines which I’m most proud of in the whole script.

  • ${NETFILE:10} - skips the first 10 characters of /proc/net/dev, leaving $OUTFILE set as dev,
  • $(<$NETFILE) - reads /proc/net/dev and the double quotes around it keep the newline characters as-is,
  • As the echo command is internal to bash (and not equal to /bin/echo) - there are no exec calls in the code

I am amazed at how much bash code I write lately. And I’m fine with that. There’s an additional sleep 0.23, which will pause the synching for that amount of seconds, so the proc filesystem will be copied about 4 times per second. Of course, this means that the accuracy of reading from the fake proc filesystem is not the same, and the data may be delayed up to 250ms in the worst case. This will result in some jagged graphs, but if it’s a problem for you and you don’t mind some extra cpu cycles, you can decrease the sleep interval.

Putting it all together

As I saved the fakenet.sh script to github, we can download it and run it ourselves:

wget https://raw.githubusercontent.com/titpetric/netdata/master/fakenet.sh
chmod a+x fakenet.sh
nohup ./fakenet.sh >/dev/null 2>&1 &

We download it, we set it as executable, and we run it in the background (&) + add nohup to keep it running after we log out of the system. It’s a bit of a lowest common denominator, you can run the script under screen if you like. As soon as you run the above, all you need to run is netdata:

docker run --cap-add SYS_PTRACE \
           -v /proc:/host/proc:ro \
           -v /sys:/host/sys:ro \
           -v /dev/shm/fakenet:/fakenet/proc/net \
           -p 19999:19999 --name netdata -d \
           titpetric/netdata

You can then visit netdata by going to http://your-ip-or-host.name:19999/. For more information about netdata itself, there’s the longer readme on titpetric/netdata which you can check out for more wisdom.

Caveat emptor

Obviously, reading from a partial copy of the proc filesystem is not exactly the same as reading from an actual filesystem. Some data might still be missing (I’m missing net under individual docker containers for example). But at least I know what’s up with my eth0 without exposing too much. For the more initiated, I’d recommend sticking with --net=host in LAN networks, and at least think about how to protect netdata in the DMZ if running on the host network. I guess it’s not very responsible to have it wide open to the internet, where everybody is screaming.

In the spirit of the Thanksgiving holiday I’m making a “black friday” deal for my book API Foundations in Go. The link includes a coupon taking 50% off your book purchase, bringing the minumum down to $5. The link is valid until November 26th, so hurry up and get it while you can.

While I have you here...

It would be great if you buy one of my books:

I promise you'll learn a lot more if you buy one. Buying a copy supports me writing more about similar topics. Say thank you and buy my books.

Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.