Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Since this is 2023 and we are releasing things that solve X and Y problems in YML I do want to take the opportunity to question whether solving problem for X or Y in YML is really the thing we should be building businesses around these days. I’ve spent the greater part of the last year or so undoing the pain of “reasonably complex GHA in YML” in my organization. It’s one of those things that sounds great conceptually, and works really well simplistically, but once your use case evolves beyond even remotely simple (for example, abstracting and maintaining this code in an engineering org in the tens of people, not even hundreds), it is a slow growing cancer that ends up being a huge time suck, unmaintainable, untestable mess, and technical debt for your org.


Did you perhaps have an alternative solution you don't mind sharing? In my opinion yaml is good enough for gitops. Easy to read, understand, modify.


I’ve been using Dagger for this replacement specifically. But that is ci/cd specific to caching and workflow execution. For things like workflow automation and orchestration I would reach for something like Prefect or Dagster. The point is to be able to do something in an actual programming language so that you not only get typing, readability, reusability, unit testability, local execution and language specific tooling, but also that it doesn’t suck for end users to write, debug and maintain. This also gives end users an escape hatch when your abstraction is inevitably not going to be good enough for them

Cuelang etc like siblings mentioned are decent enough but the real scalable solutions here are made available in general purpose programming languages.


the "workflows as code" is something we are thinking about too. I guess there are pros and cons for every approach and eventually we would want to support both (but need to start with something)


Check temporal.io, they are used by Uber, Netflix, Datadog...


temporal is for managing events. How is it related to ci/cd? Generally curious.


It’s more generally a workflow manager and orchestration tool. It is kind of a more general version of the tools I mentioned, Dagster and Prefect. It can be used to spawn and manage CICD tasks asynchronously.

Though I will say that Temporal's use case is probably not really well mapped to CI/CD - though it could be used for it (which is why I didn't mention it). It's primary strength is robust, long lived workflows with intelligent retries and the like - you typically want your CI/CD to be as fast as possible and while you want retries and resilience etc it's not as important as some other things (like being hermetic, reproducible, and cached).


Here's a Temporal v Prefect comparison I wrote: https://community.temporal.io/t/what-are-the-pros-and-cons-o...

tldr is Temporal is more general-purpose: for reliable programming in general, vs data pipelines. It supports many languages, and combining languages, has features like querying & signaling, and can do very high scale.

CI/CD is a common use case for Temporal—used by HashiCorp, Flightcontrol, Netflix: https://www.youtube.com/watch?v=LliBP7YMGyA


And what was the solution? How did you eventually address those issues? While I agree that GitHub Actions has its downsides, it's also widely used and simple to start with, which we thought was a good approach. Would you be more comfortable with 'Zapier for Monitoring' or an alternative to 'Datadog Workflow Automation'?


The complaint doesn’t seem to be about GitHub Actions, but YAML. I agree 100% percent, as soon as I saw that Keep is using YAML, I closed the tab.

Nope. Nope. Nope.

It’s like going back to Mongo without schemas and relational checks. We have perfectly good configuration languages with schemas, checks, imports, logic, etc. YAML is unacceptable in this profession.


Could you point to the mentioned configuration languages with schemas, checks, imports, logic etc?



These are interesting. I've seen both before but never understood - if I have a config that cannot be easily written out in yml, why would I force my team to learn either one of these DSLs instead of using our main dev language (say python) to generate the yml instead? What's the value proposition of jsonnet or cue?


One very important thing to point out here is that you’re not just writing config with this YML. If you look at the example on the GitHub link it’s a workflow orchestration and execution context. There’s a code runtime involved and logic is executed in the YAML. That is where YAML falls apart.

If all you’re doing is defining configurations (example is Kubernetes manifests helm charts etc) then great. But that isn’t what this is.

To your original question, I would actually advocate using the general purpose programming language for most use cases. Learning a new DSL, like you mentioned, is overhead from both a usability and maintainability perspective. I haven’t used jsonnet before but I know that cuelang gives you some power tools around typing, config validation, templating etc. it’s essentially purpose made for configuration management and tooling so it’s probably going to be really good at that. I don’t know if it’s worth using over a suite of language specific tools like Pydantic + Jinja though because when you’re using a general purpose language like python you have a whole, much larger ecosystem of tools and libraries you also have access to and can pull from.


I agree with this for internal tools only intended to be used by a relatively small organization. Using a DSL may well be preferable for open source, and for larger organizations where not every team is proficient in your Turing-complete language of choice.

Some drawbacks of plain YAML, and of tools that use string templating to render YAML:

    - difficult to extend features not exposed by upstream
    - composition is often messy, resulting in duplication
    - validation is often impractical (at least identifying the exact source of the error… I’m looking at you Helm!)
Unrelated to OP, but you can leverage Tanka to extend helm charts with functionality not provided by upstream.

https://tanka.dev/


I agree with you on that as well. The YAML aspect is somewhat of a 'low-level' concern that you shouldn't have to worry about unless you need something highly customized.

Now, let me reverse the question—what would make you keep the tab open?"


“Need[ing] something highly customized” is not some uncommon occurrence for your end users. It’s an inevitability for a large portion of them.

Give me some well supported libraries in common general purpose languages to do this, codegen is pretty good these days and supporting 3 or 4 languages shouldn’t be an insurmountable achievement.


got you, so you would imagine some typescript/python/golang sdk that let you define workflows?


Precisely. Let me interweave that into my implementation as I see fit. Maybe for some people that’s just literally pasting an example into GitHub actions and saying “python keep.py myargs” but it doesn’t have to be, it’s just another tool chain in the general purpose environment


will definitely have it in mind, thanks for the input. btw any other tools do you know that doing it?


AWS CDK has bindings for a few languages


I have been working on a data validation tool for a while. I even tried creating an extended YAML parser for data validation. You made me realize I wasted my time with that approach. Better now than later. I would love to talk to you before I throw away more code. Can we connect?


Hey! I've missed your reply here. Sent an email.


I'm the system architect and code quality gate in my company, and I feel you... my job is to keep things sane, consistent and extendable. GHA as well as Azure Logic Apps are booth helpful in the small scale but, omg, so far away from reusable or even able to deploy the same damn thing on different stages from code. To GHA: I find the GHA just look the same as Azure DevOps Pipelines yet they GHA don't hold your hand when designing and evaluating the steps.


Under the hood GHA is using the same backend as Azure Devops Pipelines so it would make sense that they look the same


Yeah but people freak out when they see Gradle, Bash, Bazel, or even wacky raw Python.

The real competition is, what will LLMs write better? Because I have zero interest in learning new DSLs, I just want whatever will be most text based to use through an LLM.


Then, honestly, you want it to write something that is statically verifyable


You probably want python then. I think it's been well demonstrated that is probably the language with the largest amount of effort has gone into training LLM's to work with, in multiple facets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: