I did something similar to this. I wanted "Lazy loaded" SVG images. Generating an average color over an area would render horrific results - an area with equal red green and blue would be gray for example.
My solution was to convert a target pixel area into a histogram in HSL color space, select the most populated color area and take an average of that populated area. You could then do smart things like render edges along where the change reached a threshold so you got clean edges. Once you blurred the SVG it was a fraction the size of a down scaled raster image (using imagemagick).
There's all kinds of very interesting pitfalls when dealing with color. RGB is great for transmitting data but horrible when applying operations. For example, your eye converts RGB in a way so that yellow essentially becomes a primary. Another is the eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average. This article has some great info in it: https://blog.asmartbear.com/color-wheels.html
> .. eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average.
Are you using linear values or the usual funky gamma space¹?
With linear values, value 1 means x photons and value n means n·x photons. Always. You can resize, blur etc. and you get the expected results as if squinting, looking from a different distance etc. Works fine in RGB.
However, with linear color you need to cover a lot of range for decent results, so instead of using linear 16+ bits per color channel, floats or ugly 8 bits, people unevenly squished them together, making a nice representative range of intensities in 5-6 or 8 bits, which we call "gamma space". Memory was very expensive and this way colors were nice.
We have been using 8-bit colors (24bit RGB, 8bit grayscale) for so long that most people don't realize the difference between linear and "gamma" space values (I didn't for a long time). They say colors are weird and merrily average a couple of gamma values, because it kind-of gets the job done and no one's got time for this. It doesn't matter if your pictures get darker and off-color each time you resize them in your typical program.
Working with colors properly would take more time and/or memory, so it's not really welcome everywhere and compatibility is a big issue. Even many artistic effects depend on these effects of gamma space "calculation". Instead, we add all sorts of workarounds, e.g. font darkening and lightening, because antialiased fonts are a fraction of a pixel thinner when dark, but thicker when white, when the AA "averaging" is in gamma space.
¹ Gamma space is a properly defined transformation, but you should really treat gamma-space values as palette color.
There's a great article that explains how to use math to address the issue you metion: https://www.picturecorrect.com/tips/photoshop-computes-color... Thankfully using interpreted languages has the added benefit of usually handling numbers better, so converting between color spaces is pretty painless.
Professionally I've handled things more conservatively. Monitors with deep color support are only now being more popular. Most server side tech seems support traditional color depth best. Now is the time to start thinking about deeper color images on by default, webp is great, and support has only recently became mainstream on iOS14.
> For example, your eye converts RGB in a way so that yellow essentially becomes a primary. Another is the eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average.
Coming from an audio background where we have the fletcher-munson noise curve, I'm really curious to learn more about how the eye perceives light. Does anyone have other sources for this, please?
Interesting, thanks for the links! The color vision link had a nice chart outlining the infrared to ultraviolet chart with sensitivity ratings.
Curious though, how serious of an impact would time of day have on this curve? If someone took the test in the early morning, would yellow be more striking/harsh than if they viewed it at night? Would blue be more striking after a day of saturated sunlight?
Edit: Blue and yellow are similarly placed on the chart, apparently accounting for the morning/night sensitivity
I’m totally speculating about this before googling anything, but I suspect that time of day alone is not a huge factor, other than right after waking you’ll have more blood in your eyes and everything’s reddish for a few minutes, but it goes away quickly.
Illuminant is a big factor in appearance, so time of day matters a lot in the sense that if the sun is the primary illuminant, it changes color based on the angle in the sky and atmospheric conditions.
It’s an interesting question, and hard to answer due to perception and adaptation - our system is really good at compensating for things like illuminant and brightness and “color surround” (background colors). We adapt pretty fast to changes in condition (think about how long it took ... before Covid ... to adapt after walking out of a midday movie in a theater), and we’re better at seeing relative color differentials than absolute colors, so the physiological perception of blue isn’t likely to change after a day of sunlight (I guess). But just the memory of the day’s colors or yesterday’s color might affect what you think you see...
Looking up Lab* color could be a good start, if you're unfamiliar- color being naturally three dimensional (eg RxGxB), LAB is a projection of color where L attempts to account for every color of a perceived lightness, and A and B account for another two dimensions (green-to-red and blue-to-yellow... you can read the wp page as well as I can). The goal being that a given distance in any direction anywhere in the volume have an equal perceptual difference to the eye, an aspect sorely missing from RGB (where eg 8bits/channel is overkill in some colors but can result in banding in others, as well as being cumbersome to process in, as GP mentioned)
If the RGB colors are equal it doesn't matter if you use a linear or non-linear color space, the color will always be gray. The brightness might be different but I'm not even sure that's true.
Light, and therefore color, is a wave when it comes to how we sense it. Like sound, if you take to frequencies and add them together, you don't get an average. You get a more complex wave that produces a complex sound (a chord) somewhat akin to a secondary or tertiary color.
If you're talking about how we sense light, it's more of a vector of 3-4 scalar values. You have activation levels for your rods and each of your cones.
Light is physically a wave, but that's not especially relevant to how you sense it.
Just for clarification, are you saying that the eye translates visual data differently if the image were displayed in RGB, as opposed to HSL?
Edit: Is the visual result not summed by the brain, no matter how it is displayed? Whether displaying RBG or HSL, the brain will interpret that in a way that its API can comprehend, yes?
That's not what he's saying. What he's saying is that if you do math in RGB space, it doesn't work out as you expect.
So the simplest example of math is taking the average of two colors. If you plot two colors in RGB space and take the midpoint of the line connecting, the resulting color tends not to look anything like "halfway" between the two colors. If you instead did the process in HSL, the result makes more sense.
Is anyone else surprised that stitchfix has been successful for long enough that they can keep such a skilled and productive data science team around? Especially considering how much churn subscription box companies have, I'm shocked that they're apparently doing so well.
They need a skilled data science team. That's the only way for them to have been so successful. When you're a small company, using your data science team to optimize every interaction is key. It's not just the image analysis, but also logistics, inventory, shipping, etc.
I look at posts like this as a very small view into one of the problems they've worked on. This post really offered little by way of rationale for why they are extracting colors. The earlier post offers a bit more, but not much.
Instead, I think that a post like this describes how they approach a challenge and how they automate work that would require manual annotation. I actually think of this analysis more as decision support (making manual review easier) as opposed to full-on automation.
A very nice writeup, but (no offense) "tens of seconds" per image? Maybe I haven't understood the technical challenges well enough, but processing an image of 60,000 pixels should surely be much faster than that. What would the cost center of such a process even be - building the superpixels?
we got good results for that specific failure mode with random sampling.
how much samples to take require some tuning, but you can use the image data itself to facilitate the tuning part by exiting the random sampling when the deviation of sampled pixels is below a factor proportional to samples/totals
In game development there are already pretty mature photogrammetry techniques that reverse engineer the lighting and shadow conditions of a scene in order to "subtract" these from objects in the scene so you can get the neutral albedo of their surfaces (this is needed for physically based rendering, so the texture in the game engine is doesn't have highlights or shadows baked into it, and can thus get applied dynamically by the engine instead based on the ingame surroundings). It'd be interesting to see if a technique like that could be applied here too. And on the flipside, if you want to do this using ML, rendered images from a game engine/blender could be used for generating large amounts of training data with perfectly defined "ground truth" colors already known.
I always thought that these techniques only work because they use multiple viewpoints of the same object (with the same lightning), can infer topography from these and then, knowing that, can "figure out" the lighting.
Would be nice to know how well this would work with just very few images, maybe even with different positions, let alone a single image.
That reminds me of photogrammetry for environments in games. Sometimes when you take, photos of an environment you include an object of known size and a spectrum of known colours for reference.
For the shadows, maybe consider a method where first take a picture with light from one direction, then from a different direction. This could be done automatically by toggling multiple light sources quickly.
You could also cluster first purely on colour (using the xy part of the xyY colourspace). For greyscale clothing you might then need to see if some colour has clearly separate shades, but that's going to be tricky without a proper understanding of shadows.
My solution was to convert a target pixel area into a histogram in HSL color space, select the most populated color area and take an average of that populated area. You could then do smart things like render edges along where the change reached a threshold so you got clean edges. Once you blurred the SVG it was a fraction the size of a down scaled raster image (using imagemagick).
There's all kinds of very interesting pitfalls when dealing with color. RGB is great for transmitting data but horrible when applying operations. For example, your eye converts RGB in a way so that yellow essentially becomes a primary. Another is the eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average. This article has some great info in it: https://blog.asmartbear.com/color-wheels.html