Fixed-Point HTML

eyelidlessness · on Sept 25, 2022

I’m surprised to see the highlights don’t include another common detail of the parsing algorithm that often trips people up: table rows and cells (tr/th/td) must be in one of thead/tbody/tfoot. If they’re not, they’re implicitly nested into a tbody. As in:

  <table>
    <!-- <tbody> -->
      <tr>
        <th>Column one</th>
        <th>Column two</th>
      </th>
      <tr>
        <td>Row one col one</td>
        <td>Row one col two</td>
      </th>
    <!-- </tbody> -->
  </table>

I’ve frequently seen it cause a variety of issues with VDOM libraries, and even plain DOM libraries with a notion of declarative templates, ranging from hydration mismatch logs (meh) to actual logic errors (corruption of the real DOM when nodes aren’t where they’re expected to be).

Other implied/omitted tags like body can cause similar issues too, but I think that’s become a far less common “mistake” (all of these are totally valid since at least HTML5) in recent years.

samwillis · on Sept 25, 2022

Annother interesting table one, tr/td/th outside of a <table> will never appear in the DOM. You can make up your own tags and they appear anywhere, but those three are magic and can only exist inside a table.

Forms are also weird, if you leave off the closing tag, an implicit one is included in the DOM. However, if you have inputs further down the page, and technically outside the form, they are included in the submitted form data.

(Don’t ask how I discovered that…)

Doxin · on Sept 27, 2022

Also fun stuff like you can't have a form inside a form, but if you stick a form inside a form inside a form you end up with a form in a form in the DOM anyways back when I ran into this last.

skybrian · on Sept 25, 2022

Perhaps a more intuitive name would be "round-trip serialization HTML". That is, if you use the browser to parse and print some HTML, it matches the source code.

Or in other words, it's formatted the same way that the browser would do it. So, you use the browser to pretty-print the HTML page, and save the code as the source. It's not hard at all and could be done automatically.

Round-trip tests are often used to check that a deserialization routine outputs data that can be serialized again and no data is lost. It even lets you change the serialization format, provided that you change the parser and printer to match.

I expect that these sort of tests are a lot more useful with fuzzing, though. Finding one example that works mostly just tells you that the browser's HTML printing code isn't completely broken. A single test of that sort is only useful for catching stupid bugs quickly.

kazinator · on Sept 25, 2022

This is called print-read consistency in the Lisp world: an object is printed in such a way that the syntax can be read to produce a similar object, or else is given a deliberately unreadable notation like #<...>, where the #< combination is required to produce a read error.

https://stackoverflow.com/questions/70797208/what-is-print-r...

gavinray · on Sept 25, 2022

Thanks for this

I've had this notion in my head, of making variables capable of echoing out their own definition when printed for easier time writing tests/debugging

Didn't know it had a name

nerdponx · on Sept 26, 2022

In Python, there is a distinction between the text representation of an object, and the result of converting the object to a string. Classes can implement both methods independently, and it's not uncommon to have a repr method that returns something that you could (at least in theory) evaluate as literal Python code. This is very useful for debugging and logging, although not nearly as cool or powerful as the Lisp equivalent.

PaulStatezny · on Sept 25, 2022

> Why write Fixed-Point HTML?

> simply the satisfaction of knowing that you and the browser are in total agreement

So, just to clarify: there's no technical benefit, correct?

nyanpasu64 · on Sept 25, 2022

My favorite example of the technical failings of HTML: https://research.securitum.com/mutation-xss-via-mathml-mutat... is a HTML sanitizing vulnerability that came about because some HTML not only doesn't survive a parse-stringify cycle, but the generated DOM tree does not survive a stringify-parse cycle!

eyelidlessness · on Sept 25, 2022

There actually may be! Depending on what you’re trying to do and what’s inconsistent between your markup and the actual DOM. As noted in my earlier comment, implicit insertion/wrapping of certain elements can cause structural changes which lead to actual code errors or unexpected behavior.

kevincox · on Sept 25, 2022

CSS errors are also common here. Especially with the child selector.

tomxor · on Sept 25, 2022

> the real reason to code in Fixed-Point HTML is simply the satisfaction of knowing that you and the browser are in total agreement about the HTML.

Interesting idea, I've been trying to achieve something similar but in reverse... rather than make my source match the browser, make the browser match my source by making it not ignore spacing.

i.e The basics being `white-space: pre;` on the body element, and fixed width and sized fonts. But I still want a HTML document so i can opt in to html where it matters. My reasons are to A) avoid a pre-processor and build toolchain complexity, stick to nice simple static files, and B) I get something similar to WYSIWYG but as source code. C) I like fixed width fonts and to plain text formatting (reducing decisions is helpful for focus).

tfsh · on Sept 25, 2022

Before now I've explicitly reduced the size of my HTML docs (nothing critical/production facing, all passion projects) by removing certain HTML tags (e.g DOCTYPE, closing tags, etc) because I know modern browsers will still render them correctly.

This means there are miniscule savings from a bandwidth serving perspective. I wonder what the trade off is between the HTTP call and document parse/paint.

E.g is it correct to assume the browser will parse/paint the HTML content - fixing incorrectly closed tags on the fly faster than the few milliseconds more it would take to serve fixed-point HTML from the server?

pushedx · on Sept 25, 2022

Interesting concept.

On latest Chrome, the "Check Fixed-Point" button appears to fail.

chuckhoupt · on Sept 25, 2022

Thanks [I'm the author]. I tested with Chrome 105 on macOS and it succeeded. Possibly there are OS/plugin/etc issues?

Of course, I know there is no guarantee that every browsers innerHTML implementation will produce exactly the same result, but so far I haven't found any variation (Chrome, FF, Safari, Edge).

hrunt · on Sept 25, 2022

In Firefox 105.0.1 on MacOS, the button also always fails when I click it.

EDIT: In my case, it appears to be some extra "<div style=\"position: static !important;\"></div>" text added before the closing </body> tag. I suspect this is introduced by a plugin, probably LastPass.

Tijdreiziger · on Sept 25, 2022

Fails on Safari on iPadOS 15.6.1 too.

edit: another commenter says ad blockers are the culprit.

thayne · on Sept 25, 2022

It also fails on Firefox 104 on Linux.

Georgelemental · on Sept 25, 2022

It might be a browser extension; on my Firefox install I have browser extensions that add HTML to every webpage, making it fail.

yazzku · on Sept 25, 2022

Html wouldn't be a Web standard if it were consistent across browsers.

WirelessGigabit · on Sept 26, 2022

XML-flavored self-closing elements are banished (use <br> instead of <br />)

God I hate that. It just doesn’t make sense. Where is the <br> closed?

alexaholic · on Sept 26, 2022

br is an empty tag and empty tags are self-closing

WirelessGigabit · on Sept 27, 2022

Doesn’t make sense. What’s wrong with <br />? It’s a hell of a lot easier to parse than having an exception for <br> which is then transformed in <br></br>.

bhedgeoser · on Sept 25, 2022

Fixed-point check failed on chrome 104.0.5112.101

MatmaRex · on Sept 25, 2022

It's because you have an ad-blocker of some sort enabled. They inject stuff into pages.

tromp · on Sept 25, 2022

That would explain why it fails on my Brave browser...

exabrial · on Sept 25, 2022

Basically: xhtml is fast and verifiable?

mirekrusin · on Sept 25, 2022

They say not to use <br /> but <br> instead.

exabrial · on Sept 25, 2022

Correct, that surprised me