Browser I

After reading the Arch philosophy applied to a browser thread, I had an idea for a browser.

It’s not yet something that really has taken shape so I couldn’t really post about it.

What I had in mind is a directory tree where you load processed sites in. Kind of like ii, the IRC client from

As for parsing, it’d be great to have a program analyze the structure of html files for getting a text file with one entry (post/menu/list item) per line for instance.

I have worked on a html parser using python’s htmllib, but it wasn’t a big success.

I think curl is great (easier than python urllib2) for getting pages and making requests. xml2 in AUR has html2 that can be used for analysis, although I have my doubts about the usefulness of the obscure output (writing a substitute is the next step in this project).

3 Responses to Browser I

  1. emallson says:

    Since HTML is (supposed to be) just XML with standardized tags, you should be able to use your favourite XML parser.

  2. procyon says:

    Thanks, I will keep that in mind because I haven’t looked into other XML parsers. Right now it’s going pretty well with just sed and awk.

    …Though a lot of information is lost, so I am thinking of secondary passes which might use something more efficient (since the base program is in bash, it will be easy to add)

