I have gone through this book, and it’s the best primer I have found for graph drawing. It covers a wide range of topics, but it is written for an audience with a working knowledge of graph theory. The book is more of a collection of academic papers than a singular How To guide. The problems are discussed in abstract, and concrete implementations aren’t provided. You may need to go read other referenced papers at points.
Unfortunately the most ubiquitous graph type, hierarchical graphs, are an NP hard problem. The chapter outlines the Sugiyama framework for breaking down the problem, but each step is still difficult. Expect to sit down with the chapter over several sessions before you will be able grok how to draw a useful graph.
If you have extra information about vertices you can use it to generate coordinates. Also: you can always augment coordinate proposals with something like the Louvain community detection algorithms at different resolutions and then MDS or UMAP etc.
This! We do it all the time in fraud, genomics, social media, security, etc data tables, it is one of our favorite techniques.
We do one more step for UMAP and other dimensionality reducers in practice: connect the nearest neighbors to turn any data table into an explorable similarity graph. Users, devices, proteins, accounts, events, etc link together and cluster when similar, and this reveals fascinating structural relationships when there is a reason they vary.
> If you have extra information about vertices you can use it to generate coordinates
Could you expand on this with an example. Or is there a reference or algo that I could look at. What do you mean by having extra info to generate coordinates. I am currently looking into graph drawing algorithms for a network viz problem and part of it involves placing nodes in certain places and I am interested to see if what you are saying is something I could use
If node or edge attribute information is useful for clustering, such as not just account IDs but say entity or event sizes & time, you may want to see sub-cluster separation based on that. A naive initial UMAP approach (I think the OP's intuition) is to ignore your graph structure and run UMAP (or t-SNE, PCA, k-means, ...) on your independent entities. No graph data involved (initially). Then you will get a scatterplot of entities where those with similar values across their features are near each other. The notebook I linked shows how to compute the UMAP on such a table. Then, instead of showing a scatter plot, you can then draw edges between neighboring entities and play with it: it's an entity similarity graph! That's the more interesting part of the notebook, and a big step up over how we see most UMAP workflows go.
However, it sounds like you already have a graph, which is more structure for clustering than out-of-the-box UMAP (PCA, ...) leverages. When you also have a graph of edges between those entities, such as causal events, and maybe even weights, you can also combine the similarity graph + yours, and run the graph clustering on the combined result. The clustering would then use both the node feature similarity and your existing graph knowledge. The UMAP similarity edges are likely more correlative & speculative than your physical graph's edges, so when running graph clustering, it often helps to assign different edge weights based on your confidence in each edge.
There are more tricks you can play here. A big one for property graphs is propagating node/edge features to nearby nodes (ex: "total connected bytes/events/$/etc.") so that the UMAP has surrounding graph information available. This starts to generalize to graph neural nets (b/c you're doing label propagation), and something we're actively looking into and are happy to chat w/ folks about: feel free to email or swing by our Slack :)
Really appreciate the response, I have not used UMAP before, thanks for giving some of the background and context. Looked at graphistry and it looks awesome, I think I do have some use cases where it will be useful :)
Roberto is fun! I didn't appreciate his work until way later =/
Funny enough, if someone is into modern versions of these graph techniques, we have been building out an end-to-end GPU visual graph intelligence platform used by all sorts of security/fraud/misinfo/genomics/supply chain teams, and have 2 data viz openings precisely on this: fullstack data viz + WebGL engine. Info over at https://www.graphistry.com/careers .
Once you load in say 100K-1B fraud events or cancer proteins and try to make the experience fully interactive, GPUs are letting us tackle all sorts of challenges that were unrealistic back then. With all graph neural net stuff happening in the last ~2 years, our assumptions of what tools can enable analysts to tease out are changing yet again. Exciting days!
Cytoscapejs works pretty well for me as a react component, I use it to draw elk graphs, which is apparently a step up from sugiyama (graphviz).
There's also dagre-d3 which uses sugiyama. I have a different react project that uses that. I think I like cytoscapejs better, although the animated transitions from graph state to graph state are a bit clunkier.
I'm not aware of anything better for open source web that allows you to edit/animate, but I haven't tried gephi/graphology/sigma.js.
yFiles as well. In fact, parts of the book have been written by people involved in the early development of yFiles. I'm sure the dead tree variant of that book sits somewhere in a shelf in the office.
I remember yEd from university 2 decades ago so got excited by the idea of yFiles. Until I calculated [1] the license price - for a single developer on a single project it'd be $14000+$4200 maintenance. Wow.
Unfortunately the most ubiquitous graph type, hierarchical graphs, are an NP hard problem. The chapter outlines the Sugiyama framework for breaking down the problem, but each step is still difficult. Expect to sit down with the chapter over several sessions before you will be able grok how to draw a useful graph.