It's a shame that they don't have you writing marketing copy! The docs are indee...

ZeroCool2u · 2025-07-25T14:00:25 1753452025

I'll just say Domino presents very much as a code first solution. So, if you want staff to be able to make dashboards _without_ code like using Looker Studio, then this isn't it.

The one other big thing that Domino isn't, is it's not a database or data warehouse. You pair it with something like BigQuery or Snowflake or just S3 and it takes a huge amount of the headache of using those things away for the staff you're describing. The best way to understand it is to just look at this page: https://docs.dominodatalab.com/en/cloud/user_guide/fa5f3a/us...

People at my work, myself included, absolutely love this feature. We have an incredibly strict and complex cloud environment and this makes it, so people can skip the setup nonsense and it will just work.

This isn't to say that you can't store data in Domino, it's just not a SQL engine. Another loved feature is their datasets. It's just EFS masquerading as an NFS, but Domino handles permissions and mounting. It's great for non-SQL file storage. https://docs.dominodatalab.com/en/cloud/user_guide/6942ab/us...

So, with those constraints in mind, I'd say it's great for what you're describing. You can deploy apps or API endpoints. You can create on-demand large scale clusters. We have people using Spark, Ray, Dask, and MPI. You can schedule jobs and you can interact with the whole platform programmatically.

dm3 · 2025-07-25T07:36:44 1753429004

Looks like we're in a similar situation. What is your current go-to for setting up lean incremental data pipelines?

For me the core of the solution - parquet in object store at rest and arrow for IPC - haven't changed in years, but I'm tired of re-building the whole metadata layer and job dependency graphs at every new place. Of course the building blocks get smarter with time (SlateDB, DuckDB, etc.) but it's all so tiresome.

benreesman · 2025-07-25T08:46:55 1753433215

Yeah, last time I had to do this was about a year ago and I used parquet and arrow on S3-compatible object stores and put a bunch of metadata in postgres and the whole thing. At that time we used Prefect for orchestration which was fine but IMHO not worth what it cost, I've also used flyte seriously and dabbled with other things, nothing that I can get really excited about recommending, it's all sort of fine but kinda meh. I used to work for a megacorp with extremely serious tooling around this and everything I've tried in open source makes me miss that.

On the front end I've always had reasonable outcomes with `wandb` for tracking runs once you kind get it all set up nicely, but it's a long tail of configuration and writing a bunch of glue code.

In this situation I'm dealing with a pretty medium amount of data and very modest model training needs (closer to `sklearn` than some mega-CUDA thing) and it feels like I should be able to give someone the company card and just get one of those things with 7 programming languages at the top of the monospace text box for "here's how to log a row", we do Smart Things and now you have this awesome web dashboard and you can give your quants this `curl foo | sh` snippet and their VSCode Jupyter will be awesome.

ZeroCool2u · 2025-07-25T14:12:38 1753452758

Just reading this as well and I neglected to mention that the Domino thing we use has Flyte (They call it Flows, but it's the same thing) and MLFlow built-in as well.