Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What Color Is This? Part 2 (stitchfix.com)
182 points by jkubicek on Oct 13, 2020 | hide | past | favorite | 41 comments


I did something similar to this. I wanted "Lazy loaded" SVG images. Generating an average color over an area would render horrific results - an area with equal red green and blue would be gray for example.

My solution was to convert a target pixel area into a histogram in HSL color space, select the most populated color area and take an average of that populated area. You could then do smart things like render edges along where the change reached a threshold so you got clean edges. Once you blurred the SVG it was a fraction the size of a down scaled raster image (using imagemagick).

There's all kinds of very interesting pitfalls when dealing with color. RGB is great for transmitting data but horrible when applying operations. For example, your eye converts RGB in a way so that yellow essentially becomes a primary. Another is the eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average. This article has some great info in it: https://blog.asmartbear.com/color-wheels.html


> .. eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average.

Are you using linear values or the usual funky gamma space¹?

With linear values, value 1 means x photons and value n means n·x photons. Always. You can resize, blur etc. and you get the expected results as if squinting, looking from a different distance etc. Works fine in RGB.

However, with linear color you need to cover a lot of range for decent results, so instead of using linear 16+ bits per color channel, floats or ugly 8 bits, people unevenly squished them together, making a nice representative range of intensities in 5-6 or 8 bits, which we call "gamma space". Memory was very expensive and this way colors were nice.

We have been using 8-bit colors (24bit RGB, 8bit grayscale) for so long that most people don't realize the difference between linear and "gamma" space values (I didn't for a long time). They say colors are weird and merrily average a couple of gamma values, because it kind-of gets the job done and no one's got time for this. It doesn't matter if your pictures get darker and off-color each time you resize them in your typical program.

Working with colors properly would take more time and/or memory, so it's not really welcome everywhere and compatibility is a big issue. Even many artistic effects depend on these effects of gamma space "calculation". Instead, we add all sorts of workarounds, e.g. font darkening and lightening, because antialiased fonts are a fraction of a pixel thinner when dark, but thicker when white, when the AA "averaging" is in gamma space.

¹ Gamma space is a properly defined transformation, but you should really treat gamma-space values as palette color.


There's a great article that explains how to use math to address the issue you metion: https://www.picturecorrect.com/tips/photoshop-computes-color... Thankfully using interpreted languages has the added benefit of usually handling numbers better, so converting between color spaces is pretty painless.

Professionally I've handled things more conservatively. Monitors with deep color support are only now being more popular. Most server side tech seems support traditional color depth best. Now is the time to start thinking about deeper color images on by default, webp is great, and support has only recently became mainstream on iOS14.


> For example, your eye converts RGB in a way so that yellow essentially becomes a primary. Another is the eye perceives photons in a logarithmic fashion adding two colors and halving them doesn't make an average.

Coming from an audio background where we have the fletcher-munson noise curve, I'm really curious to learn more about how the eye perceives light. Does anyone have other sources for this, please?


Oh yeah there are tons of sources, whole books on color physiology and perception. Wikipedia really isn’t a bad place to start - see the cone response curves at the top of this section https://en.wikipedia.org/wiki/Color_vision#Physiology_of_col...

A somewhat similar idea to the Fletcher-Munson noise curve is the “just noticeable difference” https://en.wikipedia.org/wiki/Just-noticeable_difference which can be mapped across color differences https://en.wikipedia.org/wiki/Color_difference to figure out how people perceive the brightness of a given color https://en.wikipedia.org/wiki/Brightness


Interesting, thanks for the links! The color vision link had a nice chart outlining the infrared to ultraviolet chart with sensitivity ratings.

Curious though, how serious of an impact would time of day have on this curve? If someone took the test in the early morning, would yellow be more striking/harsh than if they viewed it at night? Would blue be more striking after a day of saturated sunlight?

Edit: Blue and yellow are similarly placed on the chart, apparently accounting for the morning/night sensitivity


I’m totally speculating about this before googling anything, but I suspect that time of day alone is not a huge factor, other than right after waking you’ll have more blood in your eyes and everything’s reddish for a few minutes, but it goes away quickly.

Illuminant is a big factor in appearance, so time of day matters a lot in the sense that if the sun is the primary illuminant, it changes color based on the angle in the sky and atmospheric conditions.

It’s an interesting question, and hard to answer due to perception and adaptation - our system is really good at compensating for things like illuminant and brightness and “color surround” (background colors). We adapt pretty fast to changes in condition (think about how long it took ... before Covid ... to adapt after walking out of a midday movie in a theater), and we’re better at seeing relative color differentials than absolute colors, so the physiological perception of blue isn’t likely to change after a day of sunlight (I guess). But just the memory of the day’s colors or yesterday’s color might affect what you think you see...


Looking up Lab* color could be a good start, if you're unfamiliar- color being naturally three dimensional (eg RxGxB), LAB is a projection of color where L attempts to account for every color of a perceived lightness, and A and B account for another two dimensions (green-to-red and blue-to-yellow... you can read the wp page as well as I can). The goal being that a given distance in any direction anywhere in the volume have an equal perceptual difference to the eye, an aspect sorely missing from RGB (where eg 8bits/channel is overkill in some colors but can result in banding in others, as well as being cumbersome to process in, as GP mentioned)

https://en.wikipedia.org/wiki/CIELAB_color_space https://en.wikipedia.org/wiki/Colour_banding


Phew, thank you for the links. My brain is spent on colors for the night!


This video from Captain Disillusion does a great job of explaining both the biological mechanism and the consequences it has on displays:

https://youtu.be/FTKP0Y9MVus


> Generating an average color over an area would render horrific results - an area with equal red green and blue would be gray for example.

To be fair it sounds like you did the averaging in sRGB instead of a linear colorspace.


If the RGB colors are equal it doesn't matter if you use a linear or non-linear color space, the color will always be gray. The brightness might be different but I'm not even sure that's true.


Light, and therefore color, is a wave when it comes to how we sense it. Like sound, if you take to frequencies and add them together, you don't get an average. You get a more complex wave that produces a complex sound (a chord) somewhat akin to a secondary or tertiary color.


If you're talking about how we sense light, it's more of a vector of 3-4 scalar values. You have activation levels for your rods and each of your cones.

Light is physically a wave, but that's not especially relevant to how you sense it.


Just for clarification, are you saying that the eye translates visual data differently if the image were displayed in RGB, as opposed to HSL?

Edit: Is the visual result not summed by the brain, no matter how it is displayed? Whether displaying RBG or HSL, the brain will interpret that in a way that its API can comprehend, yes?


That's not what he's saying. What he's saying is that if you do math in RGB space, it doesn't work out as you expect.

So the simplest example of math is taking the average of two colors. If you plot two colors in RGB space and take the midpoint of the line connecting, the resulting color tends not to look anything like "halfway" between the two colors. If you instead did the process in HSL, the result makes more sense.


Is anyone else surprised that stitchfix has been successful for long enough that they can keep such a skilled and productive data science team around? Especially considering how much churn subscription box companies have, I'm shocked that they're apparently doing so well.


They need a skilled data science team. That's the only way for them to have been so successful. When you're a small company, using your data science team to optimize every interaction is key. It's not just the image analysis, but also logistics, inventory, shipping, etc.


Earlier this year Stitchfix had around 8,000 employees and 115 data scientists, and operated at about $55k of revenue per employee.

They are neither small nor particularly successful.

[1] https://www.forbes.com/sites/stevenli1/2020/02/17/stitch-fix...


Do they? I'm pretty sure an intern (or mTurk) could do this manually and more accurately for much less than the salary of whoever wrote this post.


I look at posts like this as a very small view into one of the problems they've worked on. This post really offered little by way of rationale for why they are extracting colors. The earlier post offers a bit more, but not much.

Instead, I think that a post like this describes how they approach a challenge and how they automate work that would require manual annotation. I actually think of this analysis more as decision support (making manual review easier) as opposed to full-on automation.


I'm not surprised.

1. It's a pretty common misconception that StitchFix is subscription only. A lot of their business is ad hoc just pay for a "fix".

2. Also, as far as I understand it, they are just as much a tech company as they are a retailer, if not more.

3. I know someone who works their, and from what they tell me StitchFix seems to really have their act together and just be well run in general.


The do really well in areas like middle America where the options are limited. I am not surprised they are doing so well.


This is what funding allows. You don't desperately need to be a super success because you have reserves feeding that growth.


Bruh you just don’t get how funding works


A very nice writeup, but (no offense) "tens of seconds" per image? Maybe I haven't understood the technical challenges well enough, but processing an image of 60,000 pixels should surely be much faster than that. What would the cost center of such a process even be - building the superpixels?


Exacty what I was thinking. Some years ago I also tried this and had great results by just resizing the image first to something like 100x100 pixels.


resizing is quite disruptive to patterns

we got good results for that specific failure mode with random sampling.

how much samples to take require some tuning, but you can use the image data itself to facilitate the tuning part by exiting the random sampling when the deviation of sampled pixels is below a factor proportional to samples/totals


Probably the result of using python for number crunching.


In game development there are already pretty mature photogrammetry techniques that reverse engineer the lighting and shadow conditions of a scene in order to "subtract" these from objects in the scene so you can get the neutral albedo of their surfaces (this is needed for physically based rendering, so the texture in the game engine is doesn't have highlights or shadows baked into it, and can thus get applied dynamically by the engine instead based on the ingame surroundings). It'd be interesting to see if a technique like that could be applied here too. And on the flipside, if you want to do this using ML, rendered images from a game engine/blender could be used for generating large amounts of training data with perfectly defined "ground truth" colors already known.


I always thought that these techniques only work because they use multiple viewpoints of the same object (with the same lightning), can infer topography from these and then, knowing that, can "figure out" the lighting. Would be nice to know how well this would work with just very few images, maybe even with different positions, let alone a single image.


I also had to implement something similar to this and wrote about it: https://www.mikealche.com/software-development/how-to-implem...


Is all of this because the images they have of their own products aren’t taken with so much light fill that all shadows are removed?

Like without any shadows doesn’t this become a more trivial problem?


If they could control the photo taking process, couldn't they also just have the photo takers label the colors as well?


That reminds me of photogrammetry for environments in games. Sometimes when you take, photos of an environment you include an object of known size and a spectrum of known colours for reference.


Good point. Just colour gun boom boom.


Really wish those lovely visualizations in the "Part 1" had pause buttons... some of the transitions are just too fast.


For the shadows, maybe consider a method where first take a picture with light from one direction, then from a different direction. This could be done automatically by toggling multiple light sources quickly.


Or, using a proper lighting setup, so as to minimize shadows. This would likely solve most of their shadow-related problems.


You could also cluster first purely on colour (using the xy part of the xyY colourspace). For greyscale clothing you might then need to see if some colour has clearly separate shades, but that's going to be tricky without a proper understanding of shadows.


One could also remove the expected shadow values (and hues) depending on the color of the clothing and the color of the light, at least approximately.

Should work as long as there aren't complicated interactions of the material with light that hits it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: