You'd have to be doing something where the unified memory is specifically necessary, and it's okay that it's slow. If all you want is to run large LLMs slowly, you can do that with split CPU/GPU inference using a normal desktop and a 3090, with the added benefit that a smaller model that fits in the 3090 is going to be blazing fast compared to the same model on the spark.
Perplexity's AI scraping has regularly passed the DDoS threshold. Sometimes in the thousands of requests per second - for data they should have at least cached!
Every fiber has a parent. I suppose the behavior, although possibly not the implementation, is that the ancestry chain is walked up until the nearest context provider is found. That will be the fiber associated with rendering context that provided the context.
reply