How does LoRA save more than 50% of the memory usage? I see that the weight upda...

leereeves · on March 24, 2023

I'm not an expert, but I believe it only saves memory in the final model, after training is done, by merging the low rank LoRA wrapper matrices with the original weight matrices.

For example, if an original layer has N inputs and outputs (an NxN weight matrix) LoRa adds a 16xN matrix before it and an Nx16 matrix after it, trains only those new matrices, and finally multiplies all three matrices to get a single 16x16 matrix.