> For instance, they assume, objects are tightly packed. They're not required to. They should be to save space but are not required to.
The PDF 2.0 spec says in section 7.5.3, "The body of a PDF file shall consist of a sequence of indirect objects representing the contents of a document." I'd read that as establishing the entire contents of the file body. Of course, real-world PDFs might have all sorts of garbage that a practical parser should be prepared for, but I don't think that it's condoned by the standard.
> Moreover, it is possible to place objects inside other objects. It's not advised but not prohibited.
I think the standard tokenization would prevent any string "obj" inside of an indirect object from actually being a keyword obj that starts a new indirect object. (And if the file body as a whole weren't tokenized from start to end, then "a sequence of indirect objects" would be nonsensical.)
The PDF 2.0 spec says in section 7.5.3, "The body of a PDF file shall consist of a sequence of indirect objects representing the contents of a document." I'd read that as establishing the entire contents of the file body. Of course, real-world PDFs might have all sorts of garbage that a practical parser should be prepared for, but I don't think that it's condoned by the standard.
> Moreover, it is possible to place objects inside other objects. It's not advised but not prohibited.
I think the standard tokenization would prevent any string "obj" inside of an indirect object from actually being a keyword obj that starts a new indirect object. (And if the file body as a whole weren't tokenized from start to end, then "a sequence of indirect objects" would be nonsensical.)