w3lib

Overview

This is a Python library of web-related functions, such as:

remove comments, or tags from HTML snippets
extract base url from HTML snippets
translate entites on HTML strings
encoding mulitpart/form-data
convert raw HTTP headers to dicts and vice-versa
construct HTTP auth header
RFC-compliant url joining
sanitize urls (like browsers do)
extract arguments from urls

Modules

The w3lib package consists of four modules:

w3lib.url - functions for working with URLs
w3lib.html - functions for working with HTML
w3lib.http - functions for working with HTTP
w3lib.form - functions for working with web forms

Requirements

Python 2.5, 2.6 or 2.7

Install

pip install w3lib

Documentation

For more information, see the code and tests. The functions are all documented with docstrings.

License

The w3lib library is licensed under the BSD license.

History

The code of w3lib was originally part of the Scrapy framework but was later stripped out of Scrapy, with the aim of make it more reusable and to provide a useful library of web functions without depending on Scrapy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

w3lib

Overview

Modules

Requirements

Install

Documentation

License

History

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

w3lib

Overview

Modules

Requirements

Install

Documentation

License

History