Traefik, Now With Native Go Plugins

d33 · on Sept 23, 2020

I wish I could like Traefik, but it really isn't easy.

The use case in our Hackerspace was to dispatch different Docker containers through our wild-card subdomains. Traefik is supposed to also automatically create TLS certificates. I had numerous problems with the Let's Encrypt functionality.

Debugging information is quite cryptic, the documentation seems all over to me, which is even more problematic given the number of breaking changes between 1.x and 2.x versions. The way you automatically configure things through Docker labels means that a simple typo can render your configuration ignored.

Also, plugging in Traefik to complex docker-compose projects such as Sentry or Gitlab is next to impossible, because of networking: whatever I tried, Traefik just couldn't pick up containers and forward to them unless I changed the definition of every single container in the docker-compose to include an extra network. I don't feel this should be this complex.

Sometimes I just feel that we should get back to using Nginx and write our rules manually. While the concept of Traefik is awesome, the way one uses it is extremely cumbersome.

atombender · on Sept 23, 2020

I worked on a project last year where we tried using Traefik on Kubernetes together with Let's Encrypt certs. It worked... sometimes.

We had significant issues with Traefik not allocating or renewing certs, resulting in some painful outages. The worst part was that there was no workaround; when adding a new domain to an ingress, it was completely incomprehensible why Traefik wasn't requesting a cert, or indeed why it wasn't renewing older ones that were close to expiration. We filed GitHub issues with concrete errors, but they were never addressed. At the time, I tried to debug Traefik to understand how it worked and maybe chase down some of those bugs. I don't like to speak ill of other people's code — let's just say that peeking under the covers made me realize perfectly why Traefik was so brittle and buggy.

We eventually ditched Traefik in favour of Google Load Balancer ingresses, combined with Cert-Manager for Let's Encrypt, and this combination worked flawlessly out of the box despite not being a 1.0 release at the time. The beauty of this setup is that the control plane (cert and ingress configuration) is kept separate from the data plane (web server), so the two can be maintained and upgraded/replaced separately.

simplecto · on Sept 24, 2020

Seems that when any popular project has lacking documentation this creates an opportunity for users to swoop in and own part of that story.

I did this with traefik and consequently many of my blog posts about it are my top visited pages.

And to be fair it the Traefik team invests in developer success and advocacy. They even send you swag for making contributions like popular posts.

I agree to parent posts though the docs lack concrete examples to take the ambiguity out. And debugging logs is painful sometimes.

pibefision · on Sept 23, 2020

I second this. It's incredible complex to debug how Traefik understand it's configuration, and also documentation and examples over the internet are very confusing because the version 1.x vs 2.x changes.

atombender · on Sept 24, 2020

Yep. I believe part of the wonkiness comes from the way the configuration is stored. They have this weird design where the config is mapped to key/value stores using an abstraction. You can use a TOML file, YAML file, Etcd, Redis, etc. If you use Let's Encrypt, it also uses this mechanism (e.g. Etcd) to store the state.

It ends up being confusing and brittle, and exposes the underlying store as an API (you can modify Etcd directly and the changes are picked up). There's no intermediate layer that validates or controls the lifecycle of the config or state. You can end up in a situation where you break Traefik by pushing an invalid configuration, for example.

imtringued · on Sept 24, 2020

Can't they just take the text based format and create a config tool that reads TOML/YAML and then writes that configuration to etcd, redis or whatever else they support?

kalev · on Sept 23, 2020

Ouch, we’re currently using nginx but recently switched one service to use traefik. I’m so afraid what you describe is what will bite us in the end. I wrote treafik instead of traefik in one of the labels and only noticed it after hours of debugging. When it works, it works great. But to get it in that state..

viraptor · on Sept 23, 2020

I see where the op is coming from, but I found the debugging quite easy in practice. If something doesn't work, go to the traefik panel and find the element you're looking for. If it's not there, it's normally fairly obvious.

fomine3 · on Sept 24, 2020

I wish I can also see actual active configuration at the page. I feel it lacks some debug outputs.

fuzzy2 · on Sept 23, 2020

I actually have the same setup and it's working perfectly fine, even with my IPv4+6 specific address only config + lots of file-based configuration. I absolutely recommend using the TLS challenge with Let's Encrypt.

No problems with Docker (Compose) networks either, but I'm not using it with GitLab because I have enough IPs.

The biggest problem I see is the accumulation of certificates that will all be kept up-to-date, whether in use or not.

zwayhowder · on Sept 23, 2020

I also have a working system that I found very easy (for me) to setup.

Recently it all came crashing down when an old domain I had expired and I was no longer able to update the DNS in Digital Ocean. The one - unused - domain failing stopped Traefik renewing all my certificates. But I'm also on 1.7 still and really should update to 2.x

fuzzy2 · on Sept 24, 2020

This type of configuration is _much_ more verbose on 2.x, especially if you want automatic HTTP→HTTPS redirects.

Also, Traefik 2.x still does not clean up the certificate store automatically.

sagichmal · on Sept 23, 2020

> Rather than being pre-compiled and linked, however, plugins are executed on the fly by Yaegi, an embedded Go interpreter.

Woof, no thank you.

Go is basically incompatible with any kind of plugin-like dynamic linking. There are basically two reasonable models for doing something like plugins: the HashiCorp model, where plugins are actually separate processes that do some kind of intra-process communication with the host; or the Caddy model, where you select which plugins you want when downloading the binary, and they're built-in at compile time.

hinkley · on Sept 23, 2020

Correct me if I'm wrong, but the Caddy model requires curation, doesn't it?

Plugins and scripting languages flourish when they democratize the process of adding features to a piece of code. To have prebuilt binaries you need a build matrix, and the complexity of the build matrix is somewhere between exponential and factorial.

This is a perverse incentive for the curators. The cost has to be justified, and as the friction grows you can only justify the things that you have a strong affinity for. Anything you don't understand or don't like gets voted off the island.

In the best addon ecosystems, the core maintainers put some safety rails on the system so the addons can't do anything too crazy. Then they watch the cream of the crop and start trying to include them in the base functionality (limiting the number of optional features the majority of their users have to manually pick). The hard part here is how to reward the people whose ideas you just co-opted, and I don't have a great answer for that (although money and/or a free license for life is a good start)

sagichmal · on Sept 23, 2020

> Plugins and scripting languages flourish when they democratize the process of adding features to a piece of code . . . In the best addon ecosystems, the core maintainers put some safety rails on the system so the addons can't do anything too crazy. Then they watch the cream of the crop and start trying to include them in the base functionality (limiting the number of optional features the majority of their users have to manually pick).

Well, it's a cost/benefit judgment call, not a single valuation. And I think for situations like this, if you have to pick a side, it's generally better to pick the exclusionary walled garden over the bazaar -- I think the value of democratization is usually overstated, and the drawbacks underemphasized.

alufers · on Sept 23, 2020

There is the "plugin" package which seems really cool and fits the simplistic style of Go (tbh I haven't tried this module myself, only glanced at the documentation), but it does not work on Windows, which I think is the reason it is not used. The ticket about adding Windows support to the plugin package is one of the highest rated ones on Go's GitHub, yet it is still open.

slx26 · on Sept 23, 2020

See jdoliner's comment and replies on this thread for more context and information.

sagichmal · on Sept 23, 2020

The plugin package was an experiment and is basically now abandoned, it's not meant for real use.

saurik · on Sept 23, 2020

How can they possibly be calling this "native"? :/

imtringued · on Sept 24, 2020

Any significant downside to the process based model? I haven't benchmarked the memory consumption of minimal a Go process but it should be well below what e.g. a minimal JVM application uses. With the right serialization format IPC can be reasonably efficient.

unixhero · on Sept 23, 2020

I'm sure you're onto something. But I can't really decipher whats bad here and which of the two scenarios you mention apply to go interpreted plugins.

jdoliner · on Sept 23, 2020

I really wish Go plugins got some more love from the go team. It looks like this is using Yaegi a Go interpreter, which is probably the only reasonable choice. Go's plugin package requires that the plugin be compiled with exactly the same compiler version as the main binary. So you need to recompile every plugin for every new release, at least if you upgrade the compiler between releases which you often do. It also doesn't work on windows.

emilevauge · on Sept 23, 2020

Indeed, go plugins were our initial choice (https://github.com/traefik/traefik/pull/1865). But you said everything about how bad/impossible the workflow would have been for users. Building from scratch a go interpreter was not the easiest way, but this was the best solution regarding the UX.

slx26 · on Sept 23, 2020

There was a public doc talking about the golang linker that addressed this issue at the end. My comment at the time and the post can be found here [0]. I guess there's some hope, but I haven't looked into it again, so I don't know whether anything is moving forward or not.

[0] https://news.ycombinator.com/item?id=20957741

codethief · on Sept 23, 2020

I've been wanting to use Traefik for a long time but there's this security issue[0] that's almost two(!) years old now that's been keeping me from deploying it in production. As far as I can tell, there's still no out-of-the-box solution that's not overly complicated and won't come back to haunt me a year or two from now.

[0] https://github.com/traefik/traefik/issues/4174

[1] https://doc.traefik.io/traefik/providers/docker/#docker-api-...

TheDong · on Sept 23, 2020

That so called "security issue" is silly.

You don't have to deploy traefik with docker. If you want traefik to monitor new docker containers to add routes for them, of course traefik needs to talk to the docker api to do so.

The docker api has no way to control access such that it's not equivalent to root access.

However, there's no real vulnerability. I'm happy to provide you a url hosted by traefik with the docker integration enabled, no docker socket proxy, etc, and if you can manage to actually escalate permissions, I'll give you 500 bucks. But, of course, you can't. That security issue is just a "defense in depth" issue, and it's an issue for docker, not traefik.

This would be like saying "traefik uses the linux kernel api to open files, but the linux kernel requires traefik validate what goes into that api or else it could allow file path traversal"... But traefik does validate filepaths and so no one makes that complaint.

Similarly, traefik does validate that only safe docker api calls are made and works hard to prevent any sort of remote code execution, so the issue is not a security issue, but a defense in depth proposal that is really a feature request for the docker project.

codethief · on Sept 25, 2020

> You don't have to deploy traefik with docker.

Sure but in my case that was the whole idea.

> If you want traefik to monitor new docker containers to add routes for them, of course traefik needs to talk to the docker api to do so.

Yes, but it wouldn't be necessary for the network-facing part of Traefik to talk to the Docker API. There could be a second container (w/o network access) whose only task it is to talk to the Docker socket and generate a config and write that config to a shared volume.

> However, there's no real vulnerability.

In the present situation Traefik (with Docker integration) is effectively running as root. I don't think it's up for debate that this is much worse than just running Traefik as a normal user (outside Docker). Besides, most users expect applications running in Docker containers to be more secure – not less secure – than running them on the bare system.

> This would be like saying "traefik uses the linux kernel api to open files, but the linux kernel requires traefik validate what goes into that api or else it could allow file path traversal"... But traefik does validate filepaths and so no one makes that complaint.

No. This would be like saying "Traefik has full access to the kernel and the entire OS and the only thing preventing a hacker from exploiting this is Traefik validating incoming network requests."

Do you also run your other web servers as root?

> Similarly, traefik does validate that only safe docker api calls are made

This is completely irrelevant. Once a hacker is inside the Traefik process (i.e. can execute code under Traefik's PID), he can access the Docker socket and therefore the entire system as she/he pleases.

TheDong · on Sept 25, 2020

> I don't think it's up for debate that this is much worse than just running Traefik as a normal user

I'm not arguing that there's not a better option. In fact, when I say this is a "defense in depth" issue, that's exactly what I mean. It would improve security. It would be better. But there is no active vulnerability to be exploited. If the system works as intended, this doesn't cause any issues, it's only an issue if there are other real vulnerabilities.

> Do you also run your other web servers as root?

Yes, because capsh --cap-add NET_ADMIn is poorly understood and poorly used I run many other servers as root.

Admittedly, many of them are written in other languages such as C and fork workers that drop privileges because the c stdlib (libc) supports that easily.

Traefik is written in go where forking workers that drop privileges is much harder. Go doesn't use libc, and setuid doesn't actually work correctly [0], so of course it doesn't drop privileges like other software written in better languages.

> This is completely irrelevant. Once a hacker is inside the Traefik process (i.e. can execute code under Traefik's PID)

That's the entire point. It requires an attacker to exploit a real security issue, therefore this hardening you're talking about is a defense in depth.

The way you're talking about it, you make it sound like there's an active vulnerability, not just a defense in depth improvement. They're vastly different.

The developers are not ignoring a real vulnerability, and your all-or-nothing stance on this issue is un-nuanced to the point of harming your communication about it.

My previous offer is still on: $500 bucks that you can't exploit this if I link you to a traefik configured in this way you refuse to run it.

[0]: https://github.com/golang/go/issues/1435

codethief · on Sept 26, 2020

I actually think we're completely on the same page. :)

> The way you're talking about it, you make it sound like there's an active vulnerability, not just a defense in depth improvement. They're vastly different.

Please note that I never termed it a vulnerability (whether active or not). I merely called it a security issue – which it was/is because it gives users a false sense of security. I also made that point very clear in my post on Github.

> My previous offer is still on: $500 bucks that you can't exploit this if I link you to a traefik configured in this way you refuse to run it.

Being able to recognize and avoid potential security holes and being able to exploit them in practice are two different things. I think I'm fairly good at the former (in the sense that I'm able to avoid the most common pitfalls) but I have only limited experience with the latter. And while I am aware that in-depth hacking expertise would be very valuable even when examining certain security practices (it's definitely on my to-do list), I don't think it is required to point out basic flaws and potential attack vectors. So as much as I appreciate your offer, I don't think me being or not being able to hack your system implies anything with regards to how secure Traefik is.

imtringued · on Sept 24, 2020

>The docker api has no way to control access such that it's not equivalent to root access.

I've thought this was a mistake many years ago. The fact that the docker daemon is running with root privileges is also something they should have solved a long time ago. Docker is pretty pathetic when it comes to security.

kayson · on Sept 23, 2020

This is very easily solved by using a proxy for the docker socket:

https://github.com/Tecnativa/docker-socket-proxy

https://github.com/traefik/traefik/issues/4174#issuecomment-

Create a private network that only connects Traefik and the proxy, and limit Traefik's access to only the GET requests it needs to operate. Now the socket is only exposed to a local container.

imtringued · on Sept 24, 2020

This is such an obvious solution. I am seriously wondering why this isn't integrated into docker directly.

codethief · on Sept 25, 2020

This just adds another layer of indirection. While it improves security, it is not the same as fixing the issue in the first place and making sure that no network-facing part of the system runs as root.

emilevauge · on Sept 23, 2020

This security issue is not that simple to manage as you probably know. It's mainly due to the fact that there is now way to have authorization on the the docker API. This is not the case on Kubernetes for example where you have RBAC to prevent this kind of issue. We have described this in detail in our documentation, and you have many solutions/workarounds to address this: https://doc.traefik.io/traefik/providers/docker/#docker-api-...

Spivak · on Sept 23, 2020

Yeah, I'm surprised that this is such a sticking point. There's nothing that anyone who isn't Docker Inc. can do to fix the problem that, by default, Docker is all or nothing. It would be nice if Docker could expose a read-only endpoint but c'est la vie.

The only solution I've seen/used that wasn't convoluted or brittle is running a little daemon to just shovel container metadata into Consul and going from there.

codethief · on Sept 25, 2020

> This security issue is not that simple to manage as you probably know.

I do think it's simple to manage: As I already mentioned elsewhere, it wouldn't be necessary for the network-facing part of Traefik to talk to the Docker API. There could be a second Traefik container (w/o network access) running a binary called, say, traefik-config-generator whose only task it is to talk to the Docker socket and generate a config and write that config to a shared volume.

EDIT: Oh, I just realized you're the founder of Traefik! Thank you so much for your work! I would really appreciate your opinion on my suggestion – even if you think it's complete BS. :)

3np · on Sept 23, 2020

You've probably discounted this for some reason already, but why not use something more built for service discovery - e.g. Consul Catalog / k8s / etcd?

codethief · on Sept 25, 2020

I was looking for a dead simple solution whose deployment wouldn't require much more than a simple `docker-compose`. :)

JanMa · on Sept 23, 2020

I am really happy to finally see them adding some functionality to add custom middlewares to Traefik. However it leaves a bad taste in my mouth that in order to use it, you have to sign up to their new SaaS. Especially when keeping in mind that this is the "most requested feature" of the community.

password4321 · on Sept 23, 2020

Does anyone have time to explain the downsides of the HashiCorp plugin approach (gRPC to another process) vs. creating an interpreter?

https://github.com/hashicorp/go-plugin

throwaway894345 · on Sept 23, 2020

Using an interpreter locks you into the interpreter's language (e.g., no NodeJS, Python, etc plugins), but you are able to pass around memory and call procedures from the host program directly instead of having to write an IPC shim layer.

EDIT: Updated for accuracy

thegeekpirate · on Sept 23, 2020

They actually created their own interpreter https://github.com/traefik/yaegi

Here's their original announcement post https://traefik.io/blog/announcing-yaegi-263a1e2d070a

throwaway894345 · on Sept 23, 2020

oof, I noticed that they referenced a different interpreter; didn't realize they also created it. My bad.

johnchristopher · on Sept 23, 2020

Quick question: is this the beginning of monetization of some of traefik future features ?

akerro · on Sept 23, 2020

yes, I think it's called marketplace for a reason.

rcarmo · on Sept 23, 2020

I would _really_ like to see social auth middleware (something like authelia, but simpler to setup and deploy, especially as an ingress).

gnur · on Sept 24, 2020

I recently created https://github.com/gnur/tobab that looks similar to your needs.

It's an identity aware proxy that uses google as an identity provider (but more could be added).

I built it mainly out of frustration on how complicated the 2.x release of traefik has become.

rad_gruchalski · on Sept 23, 2020

Like this but a plugin? https://github.com/thomseddon/traefik-forward-auth

thomseddon · on Sept 23, 2020

I'm planning to release this as a plugin :)

jerryoftheyear · on Sept 23, 2020

Sign me up for an easy to use 2FA layer I can put in front of services.

thegagne · on Sept 23, 2020

I’ve been toying with Azure OpenID Connect w/ Cloudflare Workers. Not local to the datacenter but you could probably do something similar elsewhere?

BrandoElFollito · on Sept 25, 2020

I tried, really tried to use traefik for a year. It worked sometimes, the setup was complicated and the community support is very poor.

I eventually moved to caddy (https://caddyserver.com/) and it is fantastic. Works seamlessly and I got all my obvious and not so obvious questions answered.

The automated pulling of container data is not automatic, but there is a port for that (https://github.com/lucaslorentz/caddy-docker-proxy) with a great meta-language.

There are a few improvements to be done with the logging part, overall it is really worth checking.

mvertes · on Sept 23, 2020

Awesome use of yaegi interpreter!

kodablah · on Sept 23, 2020

So the Go middleware is interpreted? Curious how that works with a request lazy/large body reader and a response lazy/large body writer? What kind of overhead is added to each invocation of read/write there?

mvertes · on Sept 23, 2020

from what we measured on a gzip compression plugin, only a few percents of overhead. Because the gzip part is compiled. Only the plugin glue is interpreted.

tapirl · on Sept 23, 2020

I never built Traefik successfully. The same is for tag v2.3.0.

$ go build ./... go: finding module for package github.com/traefik/traefik/v2/autogen/genstatic cmd/traefik/traefik.go:18:2: module github.com/traefik/traefik@latest found (v1.7.26), but does not contain package github.com/traefik/traefik/v2/autogen/genstatic

jspdown · on Sept 23, 2020

In order to build Traefik you have to get go-bindata and run the go generate command. You can find more information on https://doc.traefik.io/traefik/contributing/building-testing...

tapirl · on Sept 23, 2020

Got it. Built in successfully now. Thanks.

castis · on Sept 23, 2020

There hadn't been an elegant way to get geoip information to downstream nginx servers until this.

Fantastic, looking forward to playing with this.

djsumdog · on Sept 23, 2020

I replaced my haproxy setup with Traefik and it works pretty wonderfully. The LetsEncrypt integration works really well.

I don't really have any interest in plugins personally, but this is still quite an amazing project.

rileymichael · on Sept 23, 2020

The only thing keeping me from switching is the removal of "distributed LetsEncrypt" in 2.0. I get that it's a non-issue for k8s setups with cert-manager, but people aren't always using k8s and it's still a feature in the enterprise edition.

3np · on Sept 23, 2020

Could you elaborate what you mean? While it'd be ideal to store certs in vault, I'm having it run fine in orchestrated containerization with the cert storage on a distributed filesystem.

rileymichael · on Sept 24, 2020

Traefik 1.x instances would coordinate cert negotiation and then store everything in some K/V store. Some more info: https://github.com/traefik/traefik/issues/5426#issuecomment-...

Caddy supports this: https://caddyserver.com/docs/json/storage/