Python利用BeautifulSoup解析Html的方法示例

2020-01-04 17:05:53

字体：大中小

来源：转载

供稿：网友

介绍

Beautiful Soup提供一些简单的、html">python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱，通过解析文档为用户提供需要抓取的数据，因为简单，所以不需要多少代码就可以写出一个完整的应用程序。

Beautiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码。你不需要考虑编码方式，除非文档没有指定一个编码方式，这时，Beautiful Soup就不能自动识别编码方式了。然后，你仅仅需要说明一下原始编码方式就可以了。

Beautiful Soup已成为和lxml、html6lib一样出色的python解释器，为用户灵活地提供不同的解析策略或强劲的速度。

本文将给大家详细介绍关于Python利用BeautifulSoup解析Html的方法，下面话不多说了，来一起看看详细的介绍：

1. 安装Beautifulsoup4

pip install beautifulsoup4pip install lxmlpip install html5lib

lxml 和 html5lib 是解析器

2. html

<!-- This is the example.html file. --> <html><head><title>The Website Title</title></head><body><p>Download my <strong>Python</strong> book from <a href="http://inventwithpython.com" rel="external nofollow" >my website</a>.</p><p class="slogan">Learn Python the easy way!</p><p>By <span id="author">Al Sweigart</span></p></body></html>

上面的html保存html文件

3.开始解析

import bs4 exampleFile = open('example.html')exampleSoup = bs4.BeautifulSoup(exampleFile.read(),'html5lib')elems = exampleSoup.select('#author')type(elems)print (elems[0].getText())

结果输出 Al Sweigart

BeautifulSoup 使用select 方法寻找元素，类似jquery的css选择器