Ticket #204 (new defect)

Opened 5 years ago

Parsing of missing closing HTML tags duplicates HTML elements

Reported by: nothere44@… Owned by: ashinn
Priority: major Milestone:
Component: ashinn/html-parser.plt Keywords:
Cc: Version: (1 2)
Racket Version: 4.2.1

Description

(html->sxml "<div><input name=a><input name=b></div>") produces
((div (input (@ (name "a")) (input (@ (name "b")))) (input (@ (name "b")))))

Note that the input element named b is produced twice. My guess as to what is happening is that since the inputs are unclosed, the parser doesn't close them until it hits the closing </div>. At this point, it figures out that everything unclosed beforehand should be closed and notes two input elements that should be closed and closes them. However, it had already parsed the first input as containing the second input in its body, and since it doesn't reparse this it ends up closing the first input after it encloses the second input as well as closing the second input. I doubt this is intended behaviour.

This came up on yahoo.com (god, why can't they write html). Browsers rendered the site as intended (as if it was <div><input name="a"/><input name="b"/></div>), but the html->sxml parser did not.

Note: See TracTickets for help on using tickets.