Sat Apr 10 2004, 1:26 PM
Works as a filter for HTML pages. Extracts forms into a line-by-line format, easy to parse with tools like grep, cut, sed, or awk. If the page was fetched by curl with "-i" argument (including the HTTP response headers), extracts also cookies. Useful for various scripts that have to do a request for a page, then fill in the form fields there, then submit the form back.
FULLCOOKIE:PREF=ID=05db53036ce1112c:TM=1081596654:LM=1081596654:S=3kyARetFXE5FOnzf; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.comFULLCOOKIE stands for the verbatim Set-Cookie: header,COOKIE:PREF=ID=05db53036ce1112c:TM=1081596654:LM=1081596654:S=3kyARetFXE5FOnzf FORM:1:f|METHOD:|ACTION:/search FORM:1:f|INPUT:hidden|NAME:hl|VALUE:en FORM:1:f|INPUT:hidden|NAME:ie|VALUE:ISO-8859-1 FORM:1:f|INPUT:|NAME:q|VALUE: FORM:1:f|INPUT:submit|NAME:btnG|VALUE:Google Search FORM:1:f|INPUT:submit|NAME:btnI|VALUE:I'm Feeling Lucky
COOKIE stands for only the cookie itself, stripped from the timing information.
http://slashdot.org/FORM:1:|METHOD:POST|ACTION:/users.plIf more forms are present in the page, they are separated by an empty line.FORM:1:|INPUT:TEXT|NAME:unickname|VALUE: FORM:1:|INPUT:HIDDEN|NAME:returnto|VALUE:/ FORM:1:|INPUT:HIDDEN|NAME:op|VALUE:userlogin FORM:1:|INPUT:PASSWORD|NAME:upasswd|VALUE: FORM:1:|INPUT:CHECKBOX|NAME:login_temp|VALUE:yes FORM:1:|INPUT:SUBMIT|NAME:userlogin|VALUE:Log in
FORM:2:|METHOD:|ACTION://slashdot.org/pollBooth.pl FORM:2:|INPUT:hidden|NAME:qid|VALUE:1089 FORM:2:|INPUT:radio|NAME:aid|VALUE:1 FORM:2:|INPUT:radio|NAME:aid|VALUE:2 FORM:2:|INPUT:radio|NAME:aid|VALUE:3 FORM:2:|INPUT:radio|NAME:aid|VALUE:4 FORM:2:|INPUT:radio|NAME:aid|VALUE:5 FORM:2:|INPUT:radio|NAME:aid|VALUE:6 FORM:2:|INPUT:radio|NAME:aid|VALUE:7 FORM:2:|INPUT:radio|NAME:aid|VALUE:8 FORM:2:|INPUT:submit|NAME:|VALUE:Vote
FORM:3:|METHOD:get|ACTION:http://freshmeat.net/search/ FORM:3:|INPUT:hidden|NAME:link|VALUE:freshmeat.net FORM:3:|INPUT:text|NAME:q|VALUE:
FORM:4:|METHOD:GET|ACTION://slashdot.org/search.pl FORM:4:|INPUT:TEXT|NAME:query|VALUE: FORM:4:|INPUT:SUBMIT|NAME:|VALUE:Search
For every FORM, the form order number is increased. Eventually form name is extracted from the tag. This, and all subsequent form elements, are then labeled as FORM:number:name.
Pipes (|) are used as separators, for easy use with cut
or with eg. php function
explode()
, with small chance to interfere with content of the variables.
For the FORM tag, the METHOD (usually GET/POST) is extracted, together with the ACTION specified
(the URL the query will be submitted to).
TODO: ENCTYPE support and support of submitting files.
The classical INPUT tags are simple; the TYPE, NAME, and VALUE are extracted.
For SELECT and OPTION tags the name of the form is extracted from the SELECT tag and used for the subsequent OPTIONs. Eventual SELECTED value is appended if specified.
TEXTAREA behaves in a way equivalent to INPUT.