FWIW we’re considering a version where you can host the metadata yourself for enterprise users. For the free tier though we didn’t think it made sense since for a workload that could fit into our free tier, it didn’t seem like anyone would want to be responsible for the metadata layer themselves. Would love your feedback on that.
I'm also quite surprised there's no option for self hosting metadata on the free tier. To be fair, in my experience having managed metadata server for prefect (while they have similar Bring Your Own Server model) is quite hard to get right. But at the end we decided to keep maintaining it because our company prefer having all the data in our servers (including metadata).
But I'm still excited to try this, since at least now I could play around (and learn) with a partially kafka compatible system, without the burden of maintaining all of Kafka parts (and costs). Thanks!
By an independent board do you mean that the H1B holder cannot themselves be on the company's board?
Re: 50% ownership, is 50-50 split OK, or does it have to be less than 50?
Fugue is a higher level abstraction compared to Ray. It provides unified and non-invasive interfaces for people to use Spark, Dask and Pandas. Ray/Modin is also on our roadmap.
It provides both Python interface (not pandas-like) and Fugue SQL (standard SQL + extra features). Users can choose the one they are most comfortable with as the semantic layer for distributed computing, they are equivalent.
With Fugue, most of your logic will be in simple Python/SQL that is framework and scale agnostic. From the mindset to the code, Fugue minimizes your dependency on any specific computing frameworks including Fugue itself.
Please let me know if you want to learn more. our slack is in the README of the fugue repo
Well, sort of. Fugue overall is a scaling engine like ray. The specific link to yet another SQL access layer to a dataset doesnt really have an analog on ray, but has some nice features.
I love these SQL layers but they can obfuscate how they implement their transforms. So, they can speed up filter and join creation and coding... til something breaks and then you have to go atomic anyway.
Fugue SQL is one way, and it also has functional API. They both can be translated into the underlying runtime. You can choose based your preference and real need.
I would say that compared to Greykite, Darts really attempts to unify a wide variety of forecasting models under a common simple and user-friendly API. There are many differences, but for instance, AFAIK there's no deep learning model in Greykite (it focuses on two algorithms: their built-in algorithm and Prophet), whereas Darts tries to lower the barrier for using deep learning models for forecasting. Crucially for ML-based models, it also means being able to train on multiple (possibly thousands or more) of possibly multi-dimensional time series.
I really wish more design docs / project reports looked like this article - talking about the whys, goals / non-goals, options considered, prototypes built, challenges, and solutions, learnings and next steps. Kudos!
Although GitHub supports the commit-based review the author is arguing for (they found it too after writing the article), I don't think it completely solves the problem - we need to be able review, test, and merge each commit individually without needing to merge the entire PR. Since GitHub doesn't have first-class support for dependent PRs, we are still "stuck" with a broken workflow. An example workaround: https://wchargin.github.io/posts/managing-dependent-pull-req.... Are there any better solutions?