Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

hmm. after my engineer degree put all of the vector math in the form

k = Wx

seeing

k = xW

is jarring. Is there a reason for using horizontal vectors? Common for data science docs?



It’s mostly a convention. In many deep learning frameworks (PyTorch, TensorFlow, etc.), inputs are stored with the “batch × length × hidden-dim” shape, effectively making the token embeddings row vectors. Multiplying “xW” is then the natural shape-wise operation. On the other hand, classical linear algebra references often treat vectors as column vectors and write “Wx.”


Isn't batch-first a Pytorch thing? I started with Tensorflow and it's batch-last.


TFv1 or TFv2? AFAIK it's batch-first in TFv2


You are in the right here. Horizontal vectors are common for (some) deep learning docs, but column factors are the literature standard elsewhere.


It is more efficient to compute k = xW with the weights transposed than k = Wx.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: