Ticket #204 (new defect)
Opened 15 years ago
Parsing of missing closing HTML tags duplicates HTML elements
Reported by: | nothere44@… | Owned by: | ashinn |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | ashinn/html-parser.plt | Keywords: | |
Cc: | Version: | (1 2) | |
Racket Version: | 4.2.1 |
Description
(html->sxml "<div><input name=a><input name=b></div>") produces
((div (input (@ (name "a")) (input (@ (name "b")))) (input (@ (name "b")))))
Note that the input element named b is produced twice. My guess as to what is happening is that since the inputs are unclosed, the parser doesn't close them until it hits the closing </div>. At this point, it figures out that everything unclosed beforehand should be closed and notes two input elements that should be closed and closes them. However, it had already parsed the first input as containing the second input in its body, and since it doesn't reparse this it ends up closing the first input after it encloses the second input as well as closing the second input. I doubt this is intended behaviour.
This came up on yahoo.com (god, why can't they write html). Browsers rendered the site as intended (as if it was <div><input name="a"/><input name="b"/></div>), but the html->sxml parser did not.