> SiftDev flags silent failures, such as two microservices updating the same rec...

Akula112233 · 2025-03-11T21:44:33 1741729473

> I don't understand, what about that is a "silent failure"?

Silent failures can be "allowed" behavior in your applications that aren't actually labeled as errors but can be irregular. Think race conditions, deadlocks, silent timeouts, or even just mislabeled error logs.

> in order for your product to even know about it, wouldn't I need to write a log message for every single record update?

That's right, and this may not always feasible (or necessary!), but if your application can be impacted by errors like these, perhaps it may be worth logging anyway.

> the general question I have with any product that's marketing itself as being "AI-powered" - how do hallucinations get resolved?

> and if my architecture allows two microservices to update the same row in the same database...maybe it happening within 50ms is expected?

> if I ask your product "what caused such-and-such outage" and the answer that comes back is incorrect, how do I "teach" it the correct answer?

For these concerns, human-in-loop feedback is our preliminary approach! We have our own internally running to account for changes and false errors, but having explanations from human input (even as simple as "Not an error" or "Missed error" buttons) is very helpful.

> when that happens I can walk through their thought process and analysis chain with them and identify the gap that led them to the incorrect conclusion. often this is a useful signal that our system documentation needs to be updated, or log messages need to be clarified, or a dashboard should include a different metric, etc etc.

Got it, I imagine it'll be very helpful for us to display our chain of thought from our dashboards too. Great feedback, thank you!

evil-olive · 2025-03-12T07:03:08 1741762988

> Think race conditions, deadlocks, silent timeouts, or even just mislabeled error logs.

I agree that those are bad things.

but how does your product help me with them?

I have some code that has a deadlock. are you suggesting that I can find the deadlock by shipping my logs to a 3rd-party service that will feed them into an LLM?