Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A lot of this seems like the fault of the ALB, is it? I had the same problem and eventually moved off of it to cloudflare tunnels pointed at service load balancers directly, which changed immediately when pods went bad. With a grace period for normal shutdowns, I haven't seen any downtime for deploys or errors.

The issue with the above setup is (maybe I'm doing it wrong?) but if a pod is removed suddenly, say if it crashes, then some portion of traffic gets errors until the ALB updates. And that can be an agonizingly long time, which seemed because it's pointed at IP addresses in the cluster and not the service. It seemed like a shortcoming of the ALB. GKE doesn't have the same behavior.

I'm not the expert but found something that worked.



> A lot of this seems like the fault of the ALB, is it?

I definitely think the ALB Controller should be taking a more active hand in termination of pods that are targets of an ALB.

But the ALB Controller is exhibiting the same symptom I keep running into throughout Kubernetes.

The amount of "X is a problem because the pod dies too quickly before Y has a chance to clean up/whatever, so we add a preStop sleep of 30 seconds" in the Kubernetes world is truly frustrating.


If you are referring the 30 seconds to kill time, that would be holding it wrong. As long as your process is PID 1, you can rig up your own process exit handlers, which completely resolves the problem.

Many people don’t run the main process in the container as PID 1, so this “problem” remains.

If it’s not feasible to remove something like a shell process from being the first thing that runs, exec will allow replacing the shell process with the application process.


> If you are referring the 30 seconds to kill time, that would be holding it wrong. As long as your process is PID 1, you can rig up your own process exit handlers, which completely resolves the problem.

Maybe I am holding it wrong. I'd love not to have to do this work.

But I don't see how being PID 1 or not helps (and yes, for most workloads it is PID 1)

The ALB controller is the one that would need to deregister a target from the target group, and it won't until the pod is gone. So we have to force it by having the app do the functional equivalent with the readiness check.


Yeah, exactly. We just catch the TERM, clean up, and then shut down. But the rest of the top post in the thread is right on.


If I understand correctly, because ALB does its own health checks, you need to catch TERM, wait 30s while returning non-ready for ALB to have time to notice, then clean up and shut down.


Kubernetes was written by people who have developer, not ops, background and is full of things like this. The fact that it became a standard is a disaster


Maybe, or maybe orchestration and load balancing is hard. I think it's too simplistic to dismiss k8s development because the devs weren't ops.

I don't know of a tool that does a significantly better job at this without having other drawbacks and gotchas, and even if it did it doesn't void the value k8s brings.

I have my own set of gripes with software production engineering in general and specially with k8s, having seen first hand how much effort big corps have to put just to manage a cluster, but it's disrespectful to qualify this whole endeavour as disastrous.


Guys who wrote it are ok, they put a lot of effort and that's fine. If I understand things correctly, they were also compensated well. But the effort based on some wrong assumptions makes a product that is flawed. Lot of people then are forced to use it because there is no alternative, or the alternatives are easily dismissed - behavior based in turn on a certain propaganda and marketing. And that part is a disaster. This is not personal, btw.


> A lot of this seems like the fault of the ALB, is it?

People forget to enable pod readiness gates.


Pod Readiness Gates, unless I'm missing something, only help on startup.

Unless something has changed since I last went digging into this. You will still have the ALB sending traffic to a pod that's in terminating state, unless you do the preStop bits I talked about in the top of the thread.

https://kubernetes-sigs.github.io/aws-load-balancer-controll...


> Pod Readiness Gates, unless I'm missing something, only help on startup.

Also allows graceful rollout of workload.

> You will still have the ALB sending traffic to a pod that's in terminating state

The controller watches endpoints and will remove your pod from target group on pod deletion.

You don't need the preStop scam as long as your workload respects SIGTERM and does lame-duck.


> You don't need the preStop scam as long as your workload respects SIGTERM and does lame-duck.

Calling it a scam is a bit much.

I think having to put the logic of how the load balancer works into the application is a crossing of concerns. This kind of orchestration does not belong in the app, it belongs in the supporting infrastructure.

The app should not need to know how the load balancer works with regards to scheduling.

The ALB Controller should be doing this. It does not, and so we use preStop until/unless the ALB controller figures it out.

Yes, the app needs to listen for SIGTERM and wait until it's outstanding requests are completed before exiting - but not more than that.


Just curious:

- so if pod goes to terminating state

- with gates enabled, alb controller should remove it from targets instantly coz it listens to k8s api pod changes stream ?

In my experience there was ALWAYS some delay even a small one in High Frequency systems which caused 500s.

Which we solved with internal apigateway, aws+iptables+cni was always causing issues in every setup without it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: