
Jericho html parser is a simple but powerful java library allowing analysis and manipulation of parts of an html document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid html.
it also provides high-level html form manipulation functions.