What Color Is This? Part 2

pippy · on Oct 14, 2020

I did something similar to this. I wanted "Lazy loaded" SVG images. Generating an average color over an area would render horrific results - an area with equal red green and blue would be gray for example.

My solution was to convert a target pixel area into a histogram in HSL color space, select the most populated color area and take an average of that populated area. You could then do smart things like render edges along where the change reached a threshold so you got clean edges. Once you blurred the SVG it was a fraction the size of a down scaled raster image (using imagemagick).

There's all kinds of very interesting pitfalls when dealing with color. RGB is great for transmitting data but horrible when applying operations. For example, your eye converts RGB in a way so that yellow essentially becomes a primary. Another is the eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average. This article has some great info in it: https://blog.asmartbear.com/color-wheels.html

labawi · on Oct 14, 2020

> .. eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average.

Are you using linear values or the usual funky gamma space¹?

With linear values, value 1 means x photons and value n means n·x photons. Always. You can resize, blur etc. and you get the expected results as if squinting, looking from a different distance etc. Works fine in RGB.

However, with linear color you need to cover a lot of range for decent results, so instead of using linear 16+ bits per color channel, floats or ugly 8 bits, people unevenly squished them together, making a nice representative range of intensities in 5-6 or 8 bits, which we call "gamma space". Memory was very expensive and this way colors were nice.

We have been using 8-bit colors (24bit RGB, 8bit grayscale) for so long that most people don't realize the difference between linear and "gamma" space values (I didn't for a long time). They say colors are weird and merrily average a couple of gamma values, because it kind-of gets the job done and no one's got time for this. It doesn't matter if your pictures get darker and off-color each time you resize them in your typical program.

Working with colors properly would take more time and/or memory, so it's not really welcome everywhere and compatibility is a big issue. Even many artistic effects depend on these effects of gamma space "calculation". Instead, we add all sorts of workarounds, e.g. font darkening and lightening, because antialiased fonts are a fraction of a pixel thinner when dark, but thicker when white, when the AA "averaging" is in gamma space.

¹ Gamma space is a properly defined transformation, but you should really treat gamma-space values as palette color.

pippy · on Oct 16, 2020

There's a great article that explains how to use math to address the issue you metion: https://www.picturecorrect.com/tips/photoshop-computes-color... Thankfully using interpreted languages has the added benefit of usually handling numbers better, so converting between color spaces is pretty painless.

Professionally I've handled things more conservatively. Monitors with deep color support are only now being more popular. Most server side tech seems support traditional color depth best. Now is the time to start thinking about deeper color images on by default, webp is great, and support has only recently became mainstream on iOS14.

dumpsterdiver · on Oct 14, 2020

> For example, your eye converts RGB in a way so that yellow essentially becomes a primary. Another is the eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average.

Coming from an audio background where we have the fletcher-munson noise curve, I'm really curious to learn more about how the eye perceives light. Does anyone have other sources for this, please?

dahart · on Oct 14, 2020

Oh yeah there are tons of sources, whole books on color physiology and perception. Wikipedia really isn’t a bad place to start - see the cone response curves at the top of this section https://en.wikipedia.org/wiki/Color_vision#Physiology_of_col...

A somewhat similar idea to the Fletcher-Munson noise curve is the “just noticeable difference” https://en.wikipedia.org/wiki/Just-noticeable_difference which can be mapped across color differences https://en.wikipedia.org/wiki/Color_difference to figure out how people perceive the brightness of a given color https://en.wikipedia.org/wiki/Brightness

dumpsterdiver · on Oct 14, 2020

Interesting, thanks for the links! The color vision link had a nice chart outlining the infrared to ultraviolet chart with sensitivity ratings.

Curious though, how serious of an impact would time of day have on this curve? If someone took the test in the early morning, would yellow be more striking/harsh than if they viewed it at night? Would blue be more striking after a day of saturated sunlight?

Edit: Blue and yellow are similarly placed on the chart, apparently accounting for the morning/night sensitivity

dahart · on Oct 14, 2020

I’m totally speculating about this before googling anything, but I suspect that time of day alone is not a huge factor, other than right after waking you’ll have more blood in your eyes and everything’s reddish for a few minutes, but it goes away quickly.

Illuminant is a big factor in appearance, so time of day matters a lot in the sense that if the sun is the primary illuminant, it changes color based on the angle in the sky and atmospheric conditions.

It’s an interesting question, and hard to answer due to perception and adaptation - our system is really good at compensating for things like illuminant and brightness and “color surround” (background colors). We adapt pretty fast to changes in condition (think about how long it took ... before Covid ... to adapt after walking out of a midday movie in a theater), and we’re better at seeing relative color differentials than absolute colors, so the physiological perception of blue isn’t likely to change after a day of sunlight (I guess). But just the memory of the day’s colors or yesterday’s color might affect what you think you see...

smegcicle · on Oct 14, 2020

Looking up Lab* color could be a good start, if you're unfamiliar- color being naturally three dimensional (eg RxGxB), LAB is a projection of color where L attempts to account for every color of a perceived lightness, and A and B account for another two dimensions (green-to-red and blue-to-yellow... you can read the wp page as well as I can). The goal being that a given distance in any direction anywhere in the volume have an equal perceptual difference to the eye, an aspect sorely missing from RGB (where eg 8bits/channel is overkill in some colors but can result in banding in others, as well as being cumbersome to process in, as GP mentioned)

https://en.wikipedia.org/wiki/CIELAB_color_space https://en.wikipedia.org/wiki/Colour_banding

dumpsterdiver · on Oct 14, 2020

Phew, thank you for the links. My brain is spent on colors for the night!

Leszek · on Oct 14, 2020

This video from Captain Disillusion does a great job of explaining both the biological mechanism and the consequences it has on displays:

https://youtu.be/FTKP0Y9MVus

steerablesafe · on Oct 14, 2020

> Generating an average color over an area would render horrific results - an area with equal red green and blue would be gray for example.

To be fair it sounds like you did the averaging in sRGB instead of a linear colorspace.

mark-r · on Oct 15, 2020

If the RGB colors are equal it doesn't matter if you use a linear or non-linear color space, the color will always be gray. The brightness might be different but I'm not even sure that's true.

ksaj · on Oct 14, 2020

Light, and therefore color, is a wave when it comes to how we sense it. Like sound, if you take to frequencies and add them together, you don't get an average. You get a more complex wave that produces a complex sound (a chord) somewhat akin to a secondary or tertiary color.

thaumasiotes · on Oct 14, 2020

If you're talking about how we sense light, it's more of a vector of 3-4 scalar values. You have activation levels for your rods and each of your cones.

Light is physically a wave, but that's not especially relevant to how you sense it.

dumpsterdiver · on Oct 14, 2020

Just for clarification, are you saying that the eye translates visual data differently if the image were displayed in RGB, as opposed to HSL?

Edit: Is the visual result not summed by the brain, no matter how it is displayed? Whether displaying RBG or HSL, the brain will interpret that in a way that its API can comprehend, yes?

jcranmer · on Oct 14, 2020

That's not what he's saying. What he's saying is that if you do math in RGB space, it doesn't work out as you expect.

So the simplest example of math is taking the average of two colors. If you plot two colors in RGB space and take the midpoint of the line connecting, the resulting color tends not to look anything like "halfway" between the two colors. If you instead did the process in HSL, the result makes more sense.

psychometry · on Oct 14, 2020

Is anyone else surprised that stitchfix has been successful for long enough that they can keep such a skilled and productive data science team around? Especially considering how much churn subscription box companies have, I'm shocked that they're apparently doing so well.

mbreese · on Oct 14, 2020

They need a skilled data science team. That's the only way for them to have been so successful. When you're a small company, using your data science team to optimize every interaction is key. It's not just the image analysis, but also logistics, inventory, shipping, etc.

altdatathrow · on Oct 14, 2020

Earlier this year Stitchfix had around 8,000 employees and 115 data scientists, and operated at about $55k of revenue per employee.

They are neither small nor particularly successful.

[1] https://www.forbes.com/sites/stevenli1/2020/02/17/stitch-fix...

IshKebab · on Oct 14, 2020

Do they? I'm pretty sure an intern (or mTurk) could do this manually and more accurately for much less than the salary of whoever wrote this post.

mbreese · on Oct 14, 2020

I look at posts like this as a very small view into one of the problems they've worked on. This post really offered little by way of rationale for why they are extracting colors. The earlier post offers a bit more, but not much.

Instead, I think that a post like this describes how they approach a challenge and how they automate work that would require manual annotation. I actually think of this analysis more as decision support (making manual review easier) as opposed to full-on automation.

CitrusFruits · on Oct 14, 2020

I'm not surprised.

1. It's a pretty common misconception that StitchFix is subscription only. A lot of their business is ad hoc just pay for a "fix".

2. Also, as far as I understand it, they are just as much a tech company as they are a retailer, if not more.

3. I know someone who works their, and from what they tell me StitchFix seems to really have their act together and just be well run in general.

anindha · on Oct 14, 2020

The do really well in areas like middle America where the options are limited. I am not surprised they are doing so well.

taytus · on Oct 14, 2020

This is what funding allows. You don't desperately need to be a super success because you have reserves feeding that growth.

mkrazzledazzle · on Oct 14, 2020

Bruh you just don’t get how funding works

ivanbakel · on Oct 14, 2020

A very nice writeup, but (no offense) "tens of seconds" per image? Maybe I haven't understood the technical challenges well enough, but processing an image of 60,000 pixels should surely be much faster than that. What would the cost center of such a process even be - building the superpixels?

thdrdt · on Oct 14, 2020

Exacty what I was thinking. Some years ago I also tried this and had great results by just resizing the image first to something like 100x100 pixels.

avereveard · on Oct 14, 2020

resizing is quite disruptive to patterns

we got good results for that specific failure mode with random sampling.

how much samples to take require some tuning, but you can use the image data itself to facilitate the tuning part by exiting the random sampling when the deviation of sampled pixels is below a factor proportional to samples/totals

rasz · on Oct 14, 2020

Probably the result of using python for number crunching.

m12k · on Oct 14, 2020

In game development there are already pretty mature photogrammetry techniques that reverse engineer the lighting and shadow conditions of a scene in order to "subtract" these from objects in the scene so you can get the neutral albedo of their surfaces (this is needed for physically based rendering, so the texture in the game engine is doesn't have highlights or shadows baked into it, and can thus get applied dynamically by the engine instead based on the ingame surroundings). It'd be interesting to see if a technique like that could be applied here too. And on the flipside, if you want to do this using ML, rendered images from a game engine/blender could be used for generating large amounts of training data with perfectly defined "ground truth" colors already known.

maverwa · on Oct 14, 2020

I always thought that these techniques only work because they use multiple viewpoints of the same object (with the same lightning), can infer topography from these and then, knowing that, can "figure out" the lighting. Would be nice to know how well this would work with just very few images, maybe even with different positions, let alone a single image.

helloiloveyou · on Oct 14, 2020

I also had to implement something similar to this and wrote about it: https://www.mikealche.com/software-development/how-to-implem...

Waterluvian · on Oct 14, 2020

Is all of this because the images they have of their own products aren’t taken with so much light fill that all shadows are removed?

Like without any shadows doesn’t this become a more trivial problem?

wffurr · on Oct 14, 2020

If they could control the photo taking process, couldn't they also just have the photo takers label the colors as well?

Agentlien · on Oct 14, 2020

That reminds me of photogrammetry for environments in games. Sometimes when you take, photos of an environment you include an object of known size and a spectrum of known colours for reference.

Waterluvian · on Oct 14, 2020

Good point. Just colour gun boom boom.

chris_st · on Oct 14, 2020

Really wish those lovely visualizations in the "Part 1" had pause buttons... some of the transitions are just too fast.

im3w1l · on Oct 14, 2020

For the shadows, maybe consider a method where first take a picture with light from one direction, then from a different direction. This could be done automatically by toggling multiple light sources quickly.

tincholio · on Oct 14, 2020

Or, using a proper lighting setup, so as to minimize shadows. This would likely solve most of their shadow-related problems.

contravariant · on Oct 14, 2020

You could also cluster first purely on colour (using the xy part of the xyY colourspace). For greyscale clothing you might then need to see if some colour has clearly separate shades, but that's going to be tricky without a proper understanding of shadows.

Atomskun · on Oct 15, 2020

One could also remove the expected shadow values (and hues) depending on the color of the clothing and the color of the light, at least approximately.

Should work as long as there aren't complicated interactions of the material with light that hits it.