Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why do we ever need to transpose a matrix?

Isn't it better to simply combine the transposition with whatever next operation one wishes to do with the matrix?




You're right that a good graph compiler will do this for you. There still may be times, like if you're interfacing with another library, where you'll need to switch a matrix between row major or column major layouts.


Serious linear algebra libraries expect a flag that tells if elements are column-major or row-major.


The next operation might need the data in column major order to read it fast. So you might have to transpose first. And these maybe be concurrent stages of a processing pipeline.


Now I'm curious, how many times do you have to fully read the matrix in GPU for the total impact of reading columns to be higher than one-off actual transpose and then sequential row reads? I know it depends on lots of things, I'm after a rough estimate.


It's quite rare. Usually problems are tiled anyway and you can amortize the cost of having data in the "wrong" layout by loading coalesced in whatever is the best layout for your data and then transposing inside your tile, which gives you access to much faster memory.


The one pure transpose case that does come up occasionally is an in-place non-square transpose, where there is a rich literature of very fussy algorithms. If someone managed to make any headway with compiler optimization there, I'd be interested.


This could make Mojo look even better as it would ld be more compute heavy and the last step thread reduction would be less relevant.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: