How do architectural bottlenecks due to modified Von Neumann architectures' debuggable instruction pipelines limit computational performance when scaling to larger amounts of off-chip RAM?
Tomasulo's algorithm also centralizes on a common data bus (the CPU-RAM data bus) which is a bottleneck that must scale with the amount of RAM.
Can in-RAM computation solve for error correction without redundant computation and consensus algorithms?
> The term "von Neumann architecture" has evolved to refer to any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time (since they share a common bus). This is referred to as the von Neumann bottleneck, which often limits the performance of the corresponding system. [4]
> The von Neumann architecture is simpler than the Harvard architecture (which has one dedicated set of address and data buses for reading and writing to memory and another set of address and data buses to fetch instructions).
For whatever reason Hynix hasn't turned their PIM into a usable product. LPDDR based PIM is insanely effective for inference. I can't stress this enough. An NPU+LPDDR6 PIM would kill GPUs for inference.
> The simplest method therefore would be to use TOPS/W for digital approaches in future, but to use TOPS-B/W for analogue in-memory computing approaches!
> [ { Frequency, NNs employed, Precision, Sparsity and Pruning, Process node, Memory and Power Consumption, utilization} for more representative variants of TOPS/W metric ]
Tomasulo's algorithm also centralizes on a common data bus (the CPU-RAM data bus) which is a bottleneck that must scale with the amount of RAM.
Can in-RAM computation solve for error correction without redundant computation and consensus algorithms?
Can on-chip SRAM be built at lower cost?
Von Neumann architecture: https://en.wikipedia.org/wiki/Von_Neumann_architecture#Von_N... :
> The term "von Neumann architecture" has evolved to refer to any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time (since they share a common bus). This is referred to as the von Neumann bottleneck, which often limits the performance of the corresponding system. [4]
> The von Neumann architecture is simpler than the Harvard architecture (which has one dedicated set of address and data buses for reading and writing to memory and another set of address and data buses to fetch instructions).
Modified Harvard architecture > Comparisons: https://en.wikipedia.org/wiki/Modified_Harvard_architecture
C-RAM: Computational RAM > DRAM-based PIM Taxonomy, See also: https://en.wikipedia.org/wiki/Computational_RAM
SRAM: Static random-access memory https://en.wikipedia.org/wiki/Static_random-access_memory :
> Typically, SRAM is used for the cache and internal registers of a CPU while DRAM is used for a computer's main memory.