Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speaking of SD, I wonder if 1.4 will be the last truly open release as Emad said 1.5 would release a while ago but it's been held up for "compliance" reasons. Maybe they got legal threats due to using artists' works and stock images. If so, that would be sad to see it.

In a way it reminds me of people who make unofficial remakes of games but get cease and desists if they show gameplay while in development. The correct move is to fully develop the game and release it, then if you get C&Ds, too late, the game is already available to download.



There was an AMA with Emad yesterday on discord. He got asked this. The promise is that 1.5 will be released in the following week.

The slowdown has numerous issues. They got legal threats, death threats, and threats from some congresswoman to have them banned by the NSA (1).

Stability.ai workers (except for one) have a clause that they can open-source anything they're working on. They do and supposedly will open-source everything because they want to do a ecosystem, not a cash grab in the model of DALL-E.

Also they don't have one central place for all their projects and will scale from 100 to 250 employees in the following year so things should speed up.

1) https://eshoo.house.gov/media/press-releases/eshoo-urges-nsa...


> banned by the NSA

Note that this NSA is the National Security Advisor, not the National Security Agency.


This is quite a distinction to make and I'm not sure why GP didn't make it. National Security Agency is what everyone thinks of being the NSA.


make what you will of it but as of yesterday this was his answer to one of my readers: https://twitter.com/EMostaque/status/1579204017636667392

> No actually dev decision. Generative models are complex to release responsibly and team still working on release guidelines as they get much better, 1.5 is only a marginal FID improvement.


Honestly this whole "responsible AI" thing is a sad last attempt to run away from the inevitable. The reality is that our politicians will end up in fake photos/videos/audio recordings, people who can't even draw a straight line will be able to make crazy online memes with any kind of imaginable image in just seconds no matter how offensive it is to some people and there is absolutely nothing we can do about it.

When these companies and OS projects create "responsible" or restricted AIs they are at the same time creating a demand for AIs that have no limitations and eventually open source or even commercial AIs will respond to this demand.

I hope while they still play this "responsible" game, they are at least using the time to figure out ways how we can live with this kind of advanced AI in a future where everything is fake/false by default.


It's a farce.

Stable Diffusion cost less than $1 million to train.

I'm dumbfounded to see all the gatekeeping around these ML models considering this.

Within 24 months someone will train a stronger model with a $20,000 home setup and this gatekeeping ridiculousness will be dead for good.

I'm excited to see what $1 million gets you in 24 months, possibly a multi-trillion parameter NLP god. :p


This.

"Responsible" is just PR talk for "We're gonna try and NOT release it for as long as possible so that we can milk money from you by blackboxing the models and having you pay for tokens".


I believe the current stable diffusion license is "responsible" enough.


Sounds like some middle manager is trying to put his foot down and is saying things like "no more releases till we have designed and tested a 37 step release signoff procedure"


My take is that the “genie is out of the bottle”

Single source “massive models” may be more difficult to get out, but Emad said they’re working in licensing a ton of content to train future models. Even then, anyone can train new models now - The output from Dreambooth and Textual Inversion are already impressive, and seem like just the beginning.

Going to be an interesting road ahead.


Question from an ML-illiterate:

Are there known ways the training for these models could be distributed/decomposed? E.g. SETI-style distribution of a homogenous centrally-defined task, or — much more exciting — recombination of several different models / sets of weights? (I'm just throwing words around here without really understanding them.)

I'm imaging a world in which one group of enthusiasts could work together to train a model on all images on Wikipedia, another group could work on training a model that understands hands really well, and then later yet another group could combine the work of the other two without doing all that training from scratch.

Is that even remotely plausible?


It’s almost certainly not worth the bother.

The effort and time involved in setting up a distributed community training system would be extremely prone to abuse, errors and uncertainty about the results.

You could get better quality, more quickly by simply running a kickstarter and paying for dedicated gpu time.


not sure how illiterate you are, you're asking good questions, but fwiw if you watch the corridor digital video you should be able to grasp how much transfer learning is possible https://www.youtube.com/watch?v=W4Mcuh38wyM


Yes it is!


Definitely is out of the bottle, especially when training capable cards are getting within reach of regular people.

Sort of hinted at it upthread, but would be interesting if this eventually brings competition to the GPU compute space (AMD, Intel?) .


Just train on the output of existing models minus any photos with watermarks - being twice removed is sure to make it even harder to claim copyright :)


I wonder if the same will happen to Midjourney or Dall-E. I have generated images on Midjourney that literally had a 'Shutterstock' watermark plastered across them. This watermark was conspicuously missing when the image was upscaled.


not to Dall-E, because OpenAI is the very source of all these "ethical concerns" about unwashed masses having access to these tools


I've seen models amplify textures into noise, which attracts towards periodic noise, which has a goodly chance of turning into a watermark. Usually the watermark is gibberish proto-letters but yeah I've seen straight-up Shutterstock watermarks appear too.


I've had stock photo watermarks show up repeatedly in SD generations as well.


And in "paintings" I generate with SD I often see a squiggle in the corner that is presumably the result of a signature in the training set.


If you browse LAION (e.g. using this [0]), which SD uses you find all sorts of owned IP there (shutterstock, artstation, fanart from deviantart and tumblr etc.) - because it's a research data set. LAION has a disclaimer that it does not claim ownership and that any troubles you run into by using the graph are your own.

[0] https://rom1504.github.io/clip-retrieval




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: