That's like saying sentience cannot emerge from a few amino acids tumbled together, yet here we are. There is a lot of higher dimensional information encoded in those "text and images scraped off the internet". I still don't think that's enough for AGI (or ASI) but we know a lot of very complex things that are made of simple parts.
OTOH, text and images have only been around for a little while. The real question is whether text and images can contain enough information for AGI, or a physical world to interact with is needed.