Neat write-up, and thanks for putting me on to jsonld.js - looks useful.
I'm building https://simplescraper.io and we're trying to create heuristics to update CSS selectors whenever a website changes. People become unhappy when a scrape task that ran smoothly on Monday suddenly returns nothing on Tuesday so while it's a tough nut to crack it's super important.
We use a combination of XPath, historical data and data type (the value may change but the type and length often remain the same or similar) to narrow down the options.
Of course there's more sophisticated methods using Machine learning etc. but it's fun to try different approaches to solve this problem.
I'm building https://simplescraper.io and we're trying to create heuristics to update CSS selectors whenever a website changes. People become unhappy when a scrape task that ran smoothly on Monday suddenly returns nothing on Tuesday so while it's a tough nut to crack it's super important.
We use a combination of XPath, historical data and data type (the value may change but the type and length often remain the same or similar) to narrow down the options.
Of course there's more sophisticated methods using Machine learning etc. but it's fun to try different approaches to solve this problem.