Open
Description
Hi there!
I am having trouble parsing some HTML not in my control that contains unclosed tags.
An example:
<html>
<head>
<title>Hello World</title>
<link href="test.css">
</head>
<body>
<h1>Hello World</h1>
</body>
</html>
As you can see, the <link>
tag is not properly closed. This causes the parser to put everything after it inside (so, as a child within) the <link>
tag and add a "shadow" </link></head><body></body></html>
at the end.
This makes it impossible to traverse the DOM.
I'd like to have a way to configure how such cases are handled. Maybe by specifying which tags cannot contain content (auto close tags). Or maybe by changing a setting that causes the parser to automatically close tags once a parent tag has been closed.
Any help would be appreciated!