I personally wouldn't do any of these for an average project. They seem like micro-optimizations. But this is Missing the largest Go optimizations I can think of and I would do. String vs byte[] and memory allocation.
A lot of Go programs spend a lot of time converting between byte[] and string. But the conversion is actually really slow and keeping your string data as byte arrays is much faster and since many go libraries can work with either, the conversation is not necessary.
Also, memory allocations and garbage collection are not free. If you have a hot path that gets hit every request or in a loop, it can save a lot of time reusing the same slices assuming you can do so without introducing bugs and/or multi-threading issues.
Dumb question, but can you give an example of using []byte over string?
I've got a project where strings aren't exactly crucial, but we're storing a ton of them in memory. We store them as string and, frankly, rarely access most of them - though we do do some compares of strings on the hot path.
Using []byte sounds interesting, but troublesome at the same time. Namely the fact that byte is useless for reading characters from, as you'd eventually have to get runes from it anyway no?
The key takeway is there is a performance cost to converting between []byte and string and minimizing the number of conversions would be a reasonably low-hanging fruit in looking for performance optimizations.
>Using []byte sounds interesting, but troublesome at the same time. Namely the fact that byte is useless for reading characters from, as you'd eventually have to get runes from it anyway no?
Tons of string manipulation just needs to split, etc on \n, ., ;, - which don't need runes.
Besides that, string comparisons don't need runes either (assuming what you're comparing is normalized to the same bytes, which if you do the same input processing to anything you store and to the strings you query on, it would be).
[]byte is crucial if you need to frequently modify the contents. The classic example is building a string with a loop that appends a new character on each iteration. Using += to concatenate strings inside such a loop will be slow; the Right Thing To Do is to preallocate a []byte with sufficient capacity, append to it in a loop, and then (if necessary) convert to a string only at the end.
Isn't there a premade library that will abstract these optimizations away? e.g. a rope library. Looks like there's a few libraries out there that implement this data structure. (https://en.wikipedia.org/wiki/Rope_(data_structure))
Yes, it's just []rune, which has just a bit of special sauce in the runtime that allows you to convert it directly to a string [1], and some functions that convert UTF-8 to and from it. However rune is a 32-bit number sufficient for storing a Unicode code point. Generally you're dealing in UTF-8 and prefer to just pass that through code. I think I've only ever use a []rune once, for when I was doing some very heavy-duty Unicode-aware code, and I still ended up refactoring it into using UTF-8 directly in the end.
[1]: https://play.golang.org/p/5DEzw85J5Ob Note this is a conversion; the string is UTF-8 and the runes are 32-bit ints, so this creates a new string.
There's some context missing here. AFAIK the author of the repositories was specifically doing micro-benchmarks here to inform the design of his https://github.com/cornelk/hashmap library. I think for that use case (designing a high-performance lock-free data structure), that kind of micro-benchmarking makes a lot of sense.
That hash map looks good and I'm thinking we'd probably benefit from using it on the hot path of some code we have that needs to be highly scalable.
(Both links were submitted to HN, but only this one seems to have landed on the front page.)
Somewhat related; I was building a game a few years ago and passing 4x4 matrices (128 bytes) by value was faster than pass by reference. I don't know exactly how many bytes can be passed by value before it becomes slower, but please don't read this and mindlessly assume pass by value is slower.
This is not a very good example of benchmarking. First, all of the benchmarks need to be run multiple times e.g. -count=5 at least to get reliable results. Second, benchmarking like this is really meant for analyzing a single operation. It confuses the analysis when a repetition of 1024 operations is benchmarked so that the numbers have to be divided by 1024 to get comparable numbers. The golang benchmarking library can reliably measure operations of a few instructions long, so no need to repeat them N times for measurement. Sometimes you will want to see how an operation changes based on something like length, e.g. hashing 8 versus 128 characters, but in those cases the operation is "hashing" and you should have separate benchmarks to measure performance versus different lengths, which is a great use case for sub-benchmarks r.Run(). Finally, as others have mentioned, most of these are micro-optimizations that make your code less maintainable, e.g. you should always iterate over a slice with range unless you have some logical reason not to.
Yep! 1024. Good eye! Some tests, like the atomic ones, use an even larger constant (which explains the crazy 355971 nsec/op result for an atomic increment):
Yes, that one really caught my eye because if an atomic int increment took 35 microseconds, I have some code that would have exploded a long time ago.
I'm not sure what the idea is there, the benchmark system can accommodate small operations, I've used it before on things like single increments or single interface calls and gotten reasonable answers.
Also benchmarking atomic int increment without any contention is not necessarily "useless", but certainly not a full picture if one is investigating using it in a contended data structure. (My usage of it is mostly just counters, where I expect thousands upon thousands of instructions to be between the increment of any given one of them, meaning there's probably no contention to speak of even on production multi-core servers, so I have no idea how they perform under serious contention. But AFAIK it's just the hardware-supported atomic instructions, so what I don't know about is the hardware, not really Go per se.)
> I'm not sure what the idea is there, the benchmark system can accommodate small operations, I've used it before on things like single increments or single interface calls and gotten reasonable answers.
Yeah, I've been wondering if I'm missing something, because adding a second redundant loop to b.N is common in go repos, but seems pointless and needlessly obscuring.
These results should be taken with a big grain of salt. A lot of stuff true in this code will not be true at all in YOUR code. This benchmarking code, for instance, is trivial and many parts will get optimized by the compiler. This likely won't be true in your more complex code.
That can be a real pitfall for benchmarking. I recall looking at some promotional material touting the performance advantages of Rust against some other language (C?) It involved a loop so I kept increasing the loop count to try to get the Rust execution to take the same time as the other language. It took me a while but I finally figured out that the way the code was structured, the Rust compiler could see that the results of the computation were never used and the loop was simply optimized away.
Benchmarks can measure execution time of algorithms or if not careful, the ability of the ability of the compiler to optimize away unnecessary code.
(IMO the 'performance comparison' that was vastly in favor of Rust was a bit disingenuous.)
Unless it is a hot path and the CPU is your bottle neck most of these optimizations would make the code less maintainable / less safe and I wouldn't personally do any of them.
What good is maintainable, easily extendable, neatly structured code when it's slow and doesn't meet the execution time criteria imposed by your client?
For most products the "time" that matters is "time to market" and "time to resolution" on bugs. Most business uses cases don't require the kind of performance we expect of them.
Depending on industry, time is relative. These optimizations will save you milliseconds on tight loops. In reality most apps aren't spending their time on tight loops. They are spending their time waiting for I/O or using inefficient algorithms. Neither of which these optimizations solve for.
If you are underperforming spec by 100ms it is likely you have one big change you need to make to save 100ms not 100 small micro-optimizations.
As you stated yourself: Most projects, not all.
The 1% of project that are performance critical still matter to thousands of engineers.
Only because most people don't have to do this there are still some people that have to do optimzation where possible.
I hope you understand that optimization without profiling is worthless. Once profiling is identifying something like your datastructure as some kind of bottleneck it might be worth a shot investigating that hint.
And these kind of articles are a nice summary of work being done, giving an overview over techniques and small tweaks that you might not have thought about previously.
It is good work by the author and he deserves being recognized for it. Not being dismissed by the bland statement don't waste your time with optimization.
Please use defer when there is only one code path as well unless it is a hot function and you have measured a noticeable performance decrease.
Code will get changed over time and one code path often becomes two or more. Chances are, the next developer who adds code will not realize that the resource needs to be freed, or closed and simply forgets to add the defer. The only thing where defer really gets into your way is for error handling (because it can get verbose if you handle errors in defer).
In the grand scheme of things, defer is not noticeablily slower.
5300+ ns/op is for BenchMarkSize (1024) defer operations, so we are actually paying about 5ns/op for defer. Not sure why the author chose to report the ns/op number in its current way.
"so we are actually paying about 5ns/op for defer"
Yeah, it's in the class of "things you should know about when optimizing tight loops", but not "things you should always be worrying about"; remember 5ns is on the order of a branch mispredict or an L2 cache miss [1]. If you haven't already squeezed out all the main memory accesses you can from your algorithm at 100ns a pop, or if you're in code dealing with networks (routinely micro seconds, even within datacenters; milli seconds if you have to leave), or files, or anything else like that, micro-optimizing defers is not going to have any visible results to your speed but can very badly damage your code's correctness and ease of writing and modifying.
The other thing to know about defers is that they are not declarations, they are instructions; every time the program counter moves past one, another defer call is added to the stack of calls to make while exiting the function. So you do have to be careful about deferring in a loop, not so much because it's "dangerous" but just because it's easy to be unaware that the defers are not declarations. If you intend to do it, it's fine, it can just bite you if you do it accidentally.
Thanks, that is what I was looking for. I used defer just as convenience and wasn't aware (anymore) that it does something different than just pushing the function call to the end. Maybe it's time to take the Golang tour again ;-)
Defer is for reclaiming resources, or other cleanup at the end of a function which cannot otherwise be handled by the language, and (usually?) this is not going to be in your hot path; for example opening and closing a file many many times is not really a good thing to do anyway.
That repo doesn't encourage anything. It's presenting data that the author found useful for another project they were working on, which presumably did benefit from micro-optimization. You are right that micro-optimization is usually not needed, but the repo and its author never claimed otherwise.
Your assessment of the situation encourages mindless ignoring of optimization.
It always depends on the situation. Stuff that is done once in a moon... idc as long as it's reasonably fast.
For stuff that's on the hot path you sometimes have to pull up your sleeves and optimize where possible.
At work I just had to implement a heap just because the heap provided by the standard libs wasn't fitting our problem.
Please don't discourage people creating this kind of content. It matters to far more people than you might have in mind.
Most of the time it doesn't except for when it does. Your statement is overreaching and is a micro optimization in and of itself. Valuing to ignore understanding when it is important to optimize in this manner, and when it isn't.
The author's use case is very beneficial to optimize in this manner.
There are several slcie iteration ways in Go.
They do have some performance differences, depending on the element types of slices. Please view the example in the end of the article for details: https://go101.org/article/value-copy-cost.html
Sure, "premature optimization is the root of all evil" and so on. But what I find much worse, is that hashing algorithms are put side-by-side without any context, making it look like MD5 is better than SHA-2. That's downright irresponsible.
A lot of Go programs spend a lot of time converting between byte[] and string. But the conversion is actually really slow and keeping your string data as byte arrays is much faster and since many go libraries can work with either, the conversation is not necessary.
Also, memory allocations and garbage collection are not free. If you have a hot path that gets hit every request or in a loop, it can save a lot of time reusing the same slices assuming you can do so without introducing bugs and/or multi-threading issues.