> Overall, this can mean significantly higher throughput and lower energy, but g...

nbingham · on April 2, 2019

Ideally, none. We're leaning toward FPGAs and CGRAs to accelerate tight inner loops. This means that it will have a huge effect on compilers. They will have to compile from a control flow behavioral description like C to a data-flow description to map onto the array. This compilation process is honestly not solved. This is why you have verilog instead of just compiling C to circuitry. I've taken a crack at it (in the form of QDI circuit synthesis from CHP) and every sub-problem is either NP-hard or NP-complete.

Though all of this is assuming we solve the memory bottleneck... which... might come about with upcoming work on 3D integration and memristors? who knows.

no_identd · on April 7, 2019

>every sub-problem is either NP-hard or NP-complete.

Did you check if they're FPT? (Fixed Parameter Tractable)

naasking · on April 2, 2019

So this is mostly challenging for imperative languages? So a dataflow heavy language, like functional programming, might be easier to compile?

jcranmer · on April 2, 2019

The challenge isn't in extracting a dataflow graph from an imperative control flow format. Compilers are already capable of doing this, and every major compiler has an expression DAG at some point. The challenge is in mapping from a generic dataflow graph to the actual dataflow hardware primitives, which have different requirements whose constraints may require non-trivial mappings.

naasking · on April 3, 2019

Register allocation is also NP complete, but we have plenty of efficient approximations that work well enough in practice, like linear scan for fast compilation, or by graph coloring or puzzle solving for slower but more efficient compilation. Is there a comparable tradeoff available in this case?