
上QQ阅读APP看书,第一时间看更新
There's more...
Regexes can be used as well as input in the .find() and .find_all() methods. For example, this search uses the h2 and h3 tags:
>>> page.find_all(re.compile('^h(2|3)'))
[<h2>Sample Web Page</h2>, <h3><a name="contents">CONTENTS</a></h3>, <h3><a name="basics">1. Creating a Web Page</a></h3>, <h3><a name="syntax">2. HTML Syntax</a></h3>, <h3><a name="chars">3. Special Characters</a></h3>, <h3><a name="convert">4. Converting Plain Text to HTML</a></h3>, <h3><a name="effects">5. Effects</a></h3>, <h3><a name="lists">6. Lists</a></h3>, <h3><a name="links">7. Links</a></h3>,
<h3><a name="tables">8. Tables</a></h3>, <h3><a name="install">9. Installing Your Web Page on the Internet</a></h3>, <h3><a name="more">10. Where to go from here</a></h3>]
Another useful find parameter is including the CSS class with the class_ parameter. This will be shown later in the book.
The full Beautiful Soup documentation can be found here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/.