After reading the Arch philosophy applied to a browser thread, I had an idea for a browser.
It’s not yet something that really has taken shape so I couldn’t really post about it.
What I had in mind is a directory tree where you load processed sites in. Kind of like ii, the IRC client from suckless.org.
As for parsing, it’d be great to have a program analyze the structure of html files for getting a text file with one entry (post/menu/list item) per line for instance.
I have worked on a html parser using python’s htmllib, but it wasn’t a big success.
I think curl is great (easier than python urllib2) for getting pages and making requests. xml2 in AUR has html2 that can be used for analysis, although I have my doubts about the usefulness of the obscure output (writing a substitute is the next step in this project).