Score code on line count and runtime golf. Shorter, faster, and fastest time to completion is best.
Code that’s 4K and took slightly less time to write but runs slightly faster than code that’s 400 bytes that took another 30m to write still doesn’t get the best score.
I kind think metrics are not the answer and instead one needs taste. Obviously performance is multidimensional both in what one measures (latency vs throughput) and as a function of the input. The solution you imagine that is slightly faster in the test could avoid (or introduce) different worst-case or asymptotic behaviour, for example.
I argue we shouldn’t be doing this at all; but, if we have to do to whatever insanely arbitrary metric a project/product/eng leader wants, this is probably a better metric than code length.