Skip to content

Better handling for unclosed tags #1088

Open
@ricardoboss

Description

@ricardoboss

Hi there!

I am having trouble parsing some HTML not in my control that contains unclosed tags.
An example:

<html>
<head>
	<title>Hello World</title>

	<link href="test.css">
</head>
<body>
	<h1>Hello World</h1>
</body>
</html>

As you can see, the <link> tag is not properly closed. This causes the parser to put everything after it inside (so, as a child within) the <link> tag and add a "shadow" </link></head><body></body></html> at the end.

This makes it impossible to traverse the DOM.

I'd like to have a way to configure how such cases are handled. Maybe by specifying which tags cannot contain content (auto close tags). Or maybe by changing a setting that causes the parser to automatically close tags once a parent tag has been closed.

Any help would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions