Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Open source Business intelligence platform made with Python (apache.org)
212 points by nothrowaways on Dec 1, 2021 | hide | past | favorite | 49 comments



I'm looking into Superset in these days, so it's very nice to have found it here on HC.

I'm also looking for suggestions. A client of ours has a classic CSV representing sales data. I want to build a dashboard to make the visualization of the dataset easier for the client. I was looking both for Metabase and Superset. However, I don't really know if there are any other products out there which could help in this. I'm obviously referring to open source products which could be easily deployed, so I'm removing tools like Tableau or PowerBI from the list.


Can you do the analysis/visualization in Python? If so, you could use https://github.com/datapane/datapane to build and share a report or dashboard (either as an HTML file, or publish it for free on datapane.com). I'm one of the people building it, so let me know if I can help!


Having briefly used datapane, this would be my initial choice if I had no other info about parent use case and just needed something with low overhead.

Edit: Metabase probably a fair alternative if they are likely to need the number of queries and views to increase, and even more so if you'd like to have them "self serve"

I definitely would not consider Superset unless you know that you need the advanced features.


Try https://perspective.finos.org/ - it is easy to deploy as it is a purely client-side browser widget, and supports CSV and a dashboard-like experienced out of the box.

Here's a CSV example https://bl.ocks.org/texodus/02d8fd10aef21b19d6165cf92e43e668 Here's a dashboard example (with a lot of data) https://texodus.github.io/nypd-ccrb/


as far as I know, Metabase doesn't support CSV files. Google DataStudio might be an option? but also no idea if it supports CSVs (I know its not open source, but still its free and online)


> Metabase doesn't support CSV files

The expectation is that data is loaded into your database, which is reasonable. None of these tools are built to ingest data from multiple disparate sources and perform their own aggregations (like Tableau), but rather depend on locality of data on the database under query.


there is a csv driver[1] which allows you map a csv file on the metabase server, or providing a url of the csv. But I guess most people (including myself) would think of "upload csv file via UI" when it comes to csv support.

[1] https://github.com/Markenson/csv-metabase-driver/releases/ta...


Uploading CSV support is on Metabase's roadmap.

https://www.metabase.com/roadmap/


Yep, I'm fully aware of that. I should have pointed it out, sorry. Indeed, I have an ingestion process which populates an PostgreSQL db from the csv


Would a simple Pivottable using ODBC suffice?


If you're interested in BI tools but want more control as a developer I'm working on a data IDE that is SQL GUI + (jupyter style) notebook + BI tool. It runs as a desktop app so you can easily install it on a work laptop. If you want dashboards and recurring exports you can run a server version of it.

https://github.com/multiprocessio/datastation


How is it different from zeppelin?


Thanks for asking! I hadn't seen it before but it looks pretty similar to Jupyter.

In both cases the audience for most traditional notebooks are data scientists. In contrast my target audience is backend developers and hands-on engineering managers who want to build operational and business dashboards and recurring email exports by combining data from multiple different data sources.

So it comes built in with setup for every major database. It can be run as a desktop app which makes it easier to get running than web-based notebooks, or as a web app where you can make dashboards and recurring email exports. And eventually my goal is to add high level connections to common APIs developers/managers use like Github, JIRA, Kubernetes controllers, etc. so you can build reports more easily across your services.

Also, the notebook interface on its own has felt to me like it doesn't treat querying databases as a first class thing. With DataStation the database query UI is separate from the programming UI. And like a SQL gui there's builtin support for specifying (encrypted) credentials to your various databases and builtin support for querying them over SSH proxies.

In contrast though Zeppelin and Jupyter are certainly much more mature and extensible.


I really like Superset. In my last evaluation it appeared to be missing a handful of key features, such as correlated drill down/filtering between visualizations. I hope they can get those covered because then it's an easy product to integrate into my analytics stacks.


It’s got filtering at least. I haven’t seen drill down yet though.

In my opinion, it needs Ui polish and direction, as 9times out of 10, the initial page of a visualization is an error because the default visualization doesn’t work with the selected table without some config. This is really confusing for users who are new to it and just want to graph some stuff.


> correlated drill down

could you elaborate on this?


Say you have a map and a barchart in two different figures. If I select a province or state in the map, with correlated drill down the barchart also updates.


Even more advanced is brushing & linking, where you reproject a selection in one chart into another (assuming they are slicing along different dimensions - https://imgur.com/a/70ngkn3). I haven't seen too many open source products that can handle that (and not too many commercial BI products either).


Precisely. Altair/Vega-lite are good examples of projects that do this well (but its all client-side rendering so YMMV).


Yeah, it would be quite easy to do it with SQL based BI tools, as long as you can invert the selection back into a where clause. The tricky bit is overlaying the new data on top of the old (especially when you completely filter out some subset of the data, like a few bars from a barchart).


And this type of thing is why companies end up with Tableau. The open source projects are catching up though. Is this on the roadmap for Superset?


Metabase (https://www.metabase.com/) is a similar tool. It supports less DBs but I had good experiences with it at a couple of startups, it's quite easy to build useful queries, even for not technical people. I'm yet to play with Superset, I've been meaning to use it for a while. Anybody who has experience with both can share their thoughts on pros/cons depending on use cases?


We (https://timeflow.systems) use all of Metabase, Superset and heavier alternatives such as Tableau to build fairly advanced dashboards for customers.

Metabase is a great product. Simple to deploy and use, stable, easy for non techies and SQL analyst types. It tends to be my go-to.

Superset is a great product and we are lucky to have this quality of BI tools for free. It does however have a few extra rough edges and quirks where your query doesn't render as you would expect, especially with niche databases such as Druid and Clickhouse (via a third party driver). I often find myself checking log files to see what actual SQL was issued from the front-end. It's also slightly more intimidating for end users too.

Metabase is easier to deploy. I use a single binary vs (I think) a docker compose setup for Superset.

Both have a cloud offering, Preset.io is the Superset one which we are just using now and having a productive time with.

I would look at both Metabase and Superset before the heavier weight commercial options if the choice was purely down to product features. Tableau, Looker et al don't seem to be bring much to the table above these, and they are both quite difficult and expensive commercial organisations to deal with.

With all of this said, building simple reports and dashboards over aggregated data is a fairly commodity task nowadays. Unless you are doing anything funky, all of these tools and tens of others will do the job.


> It does however have a few extra rough edges and quirks

These extend to the user interface and ux. Unless you need the more sophisticated analytics features of Superset then I'd definitely start with Metabase.


Same, I brought metabase into our stack and it solves our great gap between Power BI and technical database GUI that you need to install. Yet to try superset, but it looks a bit too scary for completely non-technical users on first sight.


I'm technical and I found Superset really difficult to get started with. It didn't help that they apparently renamed a bunch of things and every online reference used the old name.


I also would like to know about the user experience of non-technical users.


We heavily use Metabase for about 5 years now and are very happy with it. In my previous company the BI/data science team did an investigation to potentially replace metabase with a more powerful tool. They concluded, that while e.g. superset would be better for themselves, they decided to stick with Metabase because the lower barrier to entry for non-technical users.

While Metabase is not the most sophisticated tool out there, we had great experience onboarding users that only knew Excel before. Which resulted in them answering the majority of basic questions about data themselves, reducing the load on the BI/data staff.


Was going to ask why would someone choose Metabase or Superset over PowerBi.

PowerBi pricing($10 per seat) is so nice compared to Tableau($85 per seat last I checked).

Then I realized that Metabase and Superset both can be run on promises free of monthly costs. That can be huge when the number of users fluctuates greatly.


I'm not familiar with PowerBi but one place I was at leaned towards Tableau (vs Metabase) due to its "storytelling" features. That is -- you could create a document with an intro and embed graphs/analytics within the document. My cynical take on that is that this was a better way for the data team to justify its position/size compared to the value contributed. Not saying there's no value to providing a story, context and explanation around the data (on the contrary), this was just one experience but I'm sure this kind of politics was (and can be) the factor in the decision to pick Tableau over Metabase long term.


Second for Metbase. It's by and far the most user-friendly self-serve tool I've deployed for business end users. Metabase requires some training for users not familiar with Excel pivot tables, but it's incredibly approachable for non-technical users to create their own reports.

Superset and Redash are great for analysts and technical users looking over more control of the output, with no expectation that the end user consuming the data will want to "drill down" further or satiate their own curiosity. But you can pull things off with these tools that Metabase isn't built for.


We're using Superset to enable our analysts to explore our clients' SEM/SEO/analytics data. It also posts alerts to Slack when, say, the daily session count of a website isn't what was expected given the historical data.

Yeah, it's a little rough to get going, but once it is, we've found it to be a really powerful (and actively developed!) BI tool. It's even better with dbt + MetriQL [0], which can automatically sync Superset's dataset metadata directly with properties you set up in dbt.

Adding custom visualizations is much harder than it should be, but they're very much aware of that, and working to address it. Their Slack community is super-helpful, too.

[0]: https://metriql.com


Another option is redash.io. I prefer Superset myself, but it's good to have choices.


Tools like this are awesome, I wish I had a good reason to play with them.

Dumb question - what makes this project better / worse than competing BI / data viz tools like Jupyter notebook, Tableau, and MS Power BI with Excel?


A lot of these tools are similar. They allow you l to query relational databases, run joins and group bys, and then display the results as either data tables or charts. To varying degrees, they hide or expose the fact that they are just building SQL queries under the hood.

Notebook based analysis is more programmatic. I could for instance pull in a query, apply statistical functions, build an ML model, perform logic, call APIs etc.

I think both have a place in many organisations, but the programmatic side is where there is more potential. Yet another dashboard is a bit uninspiring nowadays.


To add a contrary voice, I completely disagree. Data accessibility, organization and access to analyst time remain the biggest hurdles to corporate data literacy.

No business user is going to write notebook code, on the other hand I've seen some great exploration, analysis, and dashboarding work done by business users with the right self-serve setup in Looker. dbt + Metabase or Superset seems like a great cheaper / more open alternative stack.

From what I've seen, ML and programmatic stuff is flashy, but enabling business users to easily get tables and bar charts is how you get everyone making more sound decisions.


Compared to Tableau — cost.


Was about to say the same. Last year I went through a huge industry comparison exercise for my former company. Spent 6 months evaluating BI tooling. Every option out there is ridiculously expensive. Looker quoted us at $1M/year, though they were using the sticker shock strategy to get us around a $200K/year price point.

I came away with the impression that BI is simply not accessible outside of enterprise level orgs. So I love seeing tools like this come into the market. You should also check out https://cube.dev - they're much less out-of-the-box, but super flexible for developers.


I have briefly played with the opensource version of Superset, Metabase and Redash, and ended up choosing Metabase for our use case.

Metabase has the most beautiful and intuitive UI among the three, and works well when your use case is supported, but further customization is difficult. Superset is good for customization, but for setting up some standard chart types/dashboards the effort will be more than Metabase. Redash is sort of in between, and sometimes feel slow.


It's good to see they also support Google Sheets [1], which is essential imo if you're in an budding SME team with mixed specialities.

Ofc, it's not good to scale, but that doesn't matter --- it does the job well at small scales.

[1] https://superset.apache.org/docs/databases/google-sheets


I remember doing the research a few years ago. Redash was the final choice because it supports mongodb.

https://github.com/getredash/redash


Good there is Apache projects with other core languages than Java.

Is there more of them?


<neckbeard> IIRC, the Apache project started around the Apache http server, written in C. Apache http server 1.0 was released ~7 months after Java 1.0. </neckbeard>


Apache TVM [1], which is a tensor compiler stack focussed on deep neural networks.

It is mostly written in C++ and Python, integrates external libraries like LLVM, OpenCL, CUDA, etc.

[1] https://tvm.apache.org/


Airflow is probably most known one, also built on python


Apache Arrow is another one (analytics memory format), it supports a whole litany of languges via C++ bindings, too many to list.


Well, it looks like I'm building a VM today. Though we already have two BI platforms and it will get me on the bad side of the ops folks if I end up making them support another platform.


We're using it in our products, Superset is really good. Very active community, tons of movement. So much better than having to integrate Tableau, etc.


this was an airbnb project! the successor to airpal




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: