
Hpricot is a fast, flexible HTML parser written in C. It's designed to be very accomodating (like Tanaka Akira's HTree) and to have a very helpful library (like some JavaScript libs -- JQuery, Prototype -- give you).
Also, Hpricot can be handy for reading broken XML files, since many of the same principles are used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.