Vertical search engines which scrape a large number of sites already do something like this and some may already have the tools described in the article. That being said most use regular expressions instead of XPath because of malformed markup. CSS selectors are another option.
In my opinion the other side of the problem with uptake of the semantic web is that the tools and formats used to describe and access the data are kind of heavy weight. It would be nice if there was a simple way to define new data types as well as store and access the data. Perhaps something like Google Base with a bit of server-side JavaScript for scraping thrown in?
We're working on something remarkably close to what the article describes, but not taking the search engine approach you allude to. Rather, we're trying to make it into a useful multi-tool close in spirit to systems like Pachube, a component that coders/bloggers/site authors/whoever can drop into their projects.
And you're right. Semantic formats are very heavy weight. There are a lot of useful things that can be done with semi-semantic data before we achieve full linked-data across the web, if that even ever happens (you could argue for and against such visions, IMHO). Check it out at: http://scrapmetl.com/ and give us a shout on Twitter @Maciek416 and @corban if you're interested in playing.
In my opinion the other side of the problem with uptake of the semantic web is that the tools and formats used to describe and access the data are kind of heavy weight. It would be nice if there was a simple way to define new data types as well as store and access the data. Perhaps something like Google Base with a bit of server-side JavaScript for scraping thrown in?