A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.
The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.
For downloads, support and updates visit the SourceForge.net project page at http://sourceforge.net/projects/jerichohtml/
For a summary of features and comparison with some other java HTML parsers, visit the homepage at http://jerichohtml.sourceforge.net
The typical method for modifying a document is as follows. See the description of the {@link au.id.jericho.lib.html.OutputDocument} class for sample code.
If the document only needs to be analysed instead of modified, only the first two steps listed above are required. See the description of the {@link au.id.jericho.lib.html.FormFields} class for sample code.