I've seen a lot of products similar to this, and it looks like one of the nicer ...

thingsilearned · on March 11, 2020

I fully agree with your experience here. We have a super hard mission at Chartio - it's not just about the interface but also how the data is setup. The interface, being as flexible as it is and also enabling full schema (instead of dataset) browsing is a pretty big part though in also allowing a more agile version of data modeling. It had that very much in mind and we've written a book (soon to be published with Wiley) on proper modern data governance techniques.

https://dataschool.com/data-governance/

Our next phase is to help people get to that cleaner source of truth much more quickly than traditional dimensional modeling approaches. Tools like Visual SQL and DBT (https://www.getdbt.com) are really changing the complexities here.

mulmen · on March 11, 2020

I'd love to see the data modeling book. I spend a lot of energy shouting the virtues of The Data Warehouse Toolkit into the void. You are right it is outdated but it isn't entirely (or even mostly) wrong.

My coworkers seem much more interested in making a bigger EMR or adding nodes to Redshift than designing a reasonable data mart because "star schemas don't scale". I'm interested to see what you come up with, it is a huge gap in the current literature.

thingsilearned · on March 12, 2020

It's free to download on dataschool.com. I agree on the gap and that's why we wrote this book. We launched that on HN here 5 months ago

https://chartio.com/blog/cloud-data-management-book-launch/

mulmen · on March 12, 2020

Oh wow, I missed this entirely, thank you for pointing it out and I will give it a read tonight.

jeanloolz · on March 11, 2020

I would love to be involved with that book. I'm a data engineer myself and I have built SQLBucket, a python library I have been told is similar to getdbt.com (although I'm not familiar with DBT, the similarities have been mentioned to me on various occasions).

Shameless plug: http://github.com/socialpoint-labs/sqlbucket

thingsilearned · on March 11, 2020

Oh awesome! I haven't heard of SQL bucket but I'll check that out as I love anything that encourages SQL based modeling. I was writing my own and then DBT came out and we push people there primarily. Send me a note - dave-at-chartio and I'd love to chat sometime.

gpu_explorer · on March 11, 2020

Very interesting what you wrote in your article. Most interesting is how you seem to realize while designing your product that the spreadsheet surface is the most intuitive to users. They like also the baked results you present quickly. So you can see really the problem of your customer then.

What is really good is to assemble a library of visual queries for the customer. This is a good idea for the reason that many users have the same fundamental types of queries on their data. When finally you have enough of the basic queries that the user can do useful work without programming then you can find a way to customize this yes.

Have you data on how many similar queries customers use? Then you should know how to create the basic set of important operations.

thingsilearned · on March 12, 2020

I wish I could say that we had that insight from the beginning and that it worked right away, but we ended up failing into that realization after a lot of designs and prototypes.

So what you describe is somewhat built in to what we have now. Users still have to choose what columns they want to look at (there's no real way for us to guess that) and then we do apply some knowledge on what type of data they're looking at to help them get to what they're likely looking for.

We've also tried at times to make default dashboards for data sources when people connect. We can do this to some extent with known Schemas like connecting GA, SalesForce, Hubspot, etc, but for databases - that's proven to be a largely impossible task so far. Everyone's data is so different, and have such odd conditions to consider filtering by, that the auto dashboards end up being quite useless.

hobs · on March 12, 2020

Yeah, doing ETL everyone wants me to use their data and perform magic with it, but I really want to see their stored procedures and queries running on top of it before I can understand the best way to "auto dashboard" anything.

Finding out the distinct set of values in any column helps a lot, referential integrity helps a lot, but without those queries its pretty dang hard.

spaniard89277 · on March 12, 2020

Im one of those users, and I tend to use HeidiSQL or DBeaver for exploring.

mulmen · on March 11, 2020

Spot on. My favorite way to summarize this is:

SQL is easy. Data is hard.

If you aren't investing the work up front to make your data useful then no amount of tooling or magic beans will make writing meaningful SQL easy. If your reports (SQL) are hard to write it's because you have bad data.

Having said that making data exploration easier is always a worthwhile exercise. The more business people can self-serve their questions the better. It falls to data people to make that possible not just with tools but with the data itself.

boublepop · on March 12, 2020

SQL is easy like solving differential equations is easy. If you know how, you just work througH it, but that’s only because you already learned all the prerequisites and then spend a year getting good at it. Anyways the point is that either way, you’ll never have an entire organization working at this level with SQL or solving equations and that’s fine. The CEO doesn’t need it, sales don’t need it. They do need numbers though, so having a dumbed down interface that can show some curves after clicking “net sales” and “country” let’s them move forwards with what matters to them, without having to learn the difference between an inner and outer join, which doesn’t matter to them.

Not everyone needs to be a programmer even if they do need the value typically buried in programmer interfaces.

manoDev · on March 12, 2020

100x yes.

Questions to be made on data are always an afterthought, after the systems to gather and produce this data have already been designed.

The idea that it’s ok to just give a BI tool to business people later, and not involve them at the beginning to inform the system and data model design, is maybe why we’re in the tar pit.

A really excellent process/tool would help business figure out what kind of questions they need answered, and work backwards to the data model and implementation.

koreth1 · on March 12, 2020

Sometimes it's not for lack of trying, though. I've done a decent amount of data model design in my career and as I got more experienced, I learned to make a point of asking the business people what specific questions they'd need the data to answer. Very often I got only very vague answers that were of little to no use in informing my design. I've seen this repeatedly at a variety of companies over the years. The business people are often winging it as much as the engineers are.

cridal · on March 12, 2020

What do you guys think about a system that would 'guide' the user through the process of creating a star schema data warehouse (Kimball style)?

The advantage of this approach would be the fact that you do the dirty laundry upfront during the modeling & population phases. The end result is a data structure that is fool-proof, i.e. there is only one way to join facts/dimensions, it's self-documenting. In fact pre-joined views could automatically be created (and persisted if need be), giving the business user a clean structure to interact with.

rtpg · on March 12, 2020

The payoff from having some engineers doing the T of ETL (transforming the data to be more queryable, denormalizing values etc) is insane if you have problems in democratizing data analysis/reporting

The dumb thing for people who don't know what this looks like: make a separate database. Create tables based on what you often want to query (but don't try to keep the same shape as the base data!). Then figure out how to fill in those tables from the base data

ken · on March 12, 2020

Agreed. I want to see more tools like OpenRefine.

bigger_cheese · on March 11, 2020

We use SAS Enterprise Guide in my org it has a visual SQL generator similar to what is previewed it is quite good for simple queries but I find it falls down when you introduce complexities.

Joins between tables are one of the big pain points for me. I am not sure how the demoed product does joins but in SAS EG you join tables by click and dragging a line between the two columns you want to join together and then it gives you a pop up to select Inner, Outer, Left, Right etc. (although it uses plain English. I.e rather than saying "left join" it says something like "All Rows from Table 1 and matching Rows from Table 2) it also gives you a visual Venn diagram type preview. This is frustrating for a few reasons: if table is large i.e several hundred columns you have to scroll for an eternity to find the columns you want to join against which is frustrating (a drop down box you can use keyboard shortcuts to jump to column name would be much better UI IMO). It is difficult to do complex joins "Where A.Col1 Between B.Col1 and B.Col2" for example. Maybe just my org but this is common feature of our data we have a lot of Event driven stuff with Start and End date and then Raw time series data you need to aggregate between the event frames. SAS's Timeseries stuff is very good but SQL side not so much...

It is also very easy for unsuspecting users to join on unindexed columns which leads to very poor database performance. In general there are a lot of performance footguns with generated SQL user can have working query then change something in the GUI and suddenly query that ran in subsecond takes several minutes. I have developed the habit that for complex I use GUI to generate SQL then hand edit it before running it to make sure it will be performant. Or I just hand write it to begin with.

Pattern matching and filtering is another pain point the query editor lets you use "LIKE" but does a poor job explaining to user how it works people who only use the GUI are surprised when I explain you use "%" and "_" to do character expansion a lot of people blindly assume "*" is used and then complain their query isn't working

Microsoft Power BI also has a query editor I've used this program a little bit but do not have as much experience from what I experienced using it I found it to be a bit less flexible it seemed easier to do the data extract and transformation first then load it into Power BI.

reyntime · on March 12, 2020

Absolutely. I've been employed in healthcare IT for the last 5 years, doing data extracts for researchers and reporting. Many people don't quite appreciate the underlying complexities of extracting data from not only large EMR systems such as Cerner (with over 6000 schema-less tables), but integrating that with the countless other systems the hospital employs for things such as radiology, pathology, ICD-coding, etc. Many data aren't entered as you would expect, documentation is often lacking and there are very few people who can tell you where exactly the data sits, and how data is actually entered by medical staff doing various work all across the campuses. A simple research request becomes immensely complex when multiple systems are involved and you're using something as complex and as evolving as health data.

So I agree that while this tool may be handy for some, the real challenge with this sort of work is knowing where the (often unstructured) data sits, how it's entered and by whom, and how to extract meaning from it.

SOLAR_FIELDS · on March 11, 2020

I think of use cases when I see this product other than what you mentioned. I don’t see this as an end user product for analysis. IMO a product like this can be used for iterating over backend design with other developers when discussing a new feature or model change. Imagine walking through a complex data model when discussing a new feature and needing to quickly visualize stuff. Now I write SQL by hand pretty quickly but I would love to use a product like this in a team design discussion where we want to visualize what we would be pulling from the DB and instead of having to hand write all the joins and filters as you walk 5-6 tables you can quickly and easily see what data is going to be provided to your DAO layer. You can spend way more time focusing on the discussion and less time waiting around for data access changes to be visualized. Then in the end the developer can take the auto generated SQL and use it as a base to start writing code for the DAO layer. Then the developer can do the optimizations and cleanup to the query you mentioned in your post.

thingsilearned · on March 11, 2020

So the agile world of data we're looking to bring about is one where your DAO or marts, are still regular schemas - not just flat tables or cubes. So the developer teams would be creating these mart schemas, and then business users could be doing much of their own self-serve exploration and visualization with Visual SQL. I wrote about some of these modern architecture approaches here https://chartio.com/blog/cloud-data-management-book-launch/

S_A_P · on March 12, 2020

First I want to applaud the effort here, this looks like a great tool for BI/dash boarding and Ive signed up to demo it.

I'm not here to poo-poo your product at all but want to explain why I think this cant work in all cases, and that is totally fine!

The application that is my bread and butter is about 25 years old and started as a mid 90s power builder application. Its since migrated to a 3 tier .NET enterprise app and the data model has been dragged along with it largely unchanged. The data model is full of cruft and unnatural keys that are the result of a ton of technical debt. The newer tables in the database are more sane and follow most SQL best practices, but there is a large portion that was written (and I know the original data architect who SWEARS this is true) in no-vowel notation, unless the table starts with a vowel, or maybe throw a random e in there too. A few examples:

Sane column name -> actual column name AccountDetailAccountCode -> AcctDtlAcctCdeID AccountingPeriodID -> AccntngPrdID PlannedMovementBatchID -> PlnndMvtBtchID Provision -> prvsn

There are about 1000 tables in this database, so part of the problem with SQL is that you have to know about the table structure to know about the database. Another is that a lot of this data is constructed in stored procedures, functions or views.

Most people need at least 12-18 months to totally grasp the application and the data model in a real way. I have to create database diagrams still to find a path to get the data I want, and Ive been at it for 10+ years. What I would love to see is a way to show table relationships that is filterable in some way- e.g. not just because there is a key relationship. To me its much more useful if I knew that table a is related to table b and both table a and table b have some threshold of row counts. I don't care if there is a key that joins table a and table b if table b has 0 rows. Building that intelligent meta data about the database would be hugely helpful.

rukuu001 · on March 12, 2020

> At least 95% of the challenge in querying data doesn't come from the difficulty of writing SQL statements IMO, but from the complexity/brokenness of the data itself.

More and more of our consulting gigs are this - collating and cleaning crappy data, and then putting BI reporting on top of it.

omegadeep10 · on March 11, 2020

I agree with this as well. Products like this seem to be designed primarily for well-structured and lovingly maintained databases. If you are lucky enough to have one of those, congrats!