Python BS4库的安装与使用详解

2020-02-15 22:42:51

字体：大中小

来源：转载

供稿：网友

Beautiful Soup 库一般被称为bs4库，支持Python3，是我们写爬虫非常好的第三方库。因用起来十分的简便流畅。所以也被人叫做“美味汤”。目前bs4库的最新版本是4.60。下文会介绍该库的最基本的使用，具体详细的细节还是要看：[官方文档](Beautiful Soup Documentation)

bs4库的安装

Python的强大之处就在于他作为一个开源的语言，有着许多的开发者为之开发第三方库，这样我们开发者在想要实现某一个功能的时候，只要专心实现特定的功能，其他细节与基础的部分都可以交给库来做。bs4库就是我们写爬虫强有力的帮手。

安装的方式非常简单：我们用pip工具在命令行里进行安装

$ pip install beautifulsoup4

接着我们看一下是否成功安装了bs4库

$ pip list

这样我们就成功安装了 bs4 库

bs4库的简单使用

这里我们先简单的讲解一下bs4库的使用，

暂时不去考虑如何从web上抓取网页，

假设我们需要爬取的html是如下这么一段：

下面的一段HTML代码将作为例子被多次用到.这是爱丽丝梦游仙境的的一段内容(以后内容中简称为爱丽丝的文档):

<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p>  <p class="story">Once upon a time there were three little sisters; and their names werehttp://example.com/elsie" class="sister" id="link1">Elsie,http://example.com/lacie" class="sister" id="link2">Lacie andhttp://example.com/tillie" class="sister" id="link3">Tillie;and they lived at the bottom of a well.</p>  <p class="story">...</p></html>

下面我们开始用bs4库解析这一段html网页代码。

#导入bs4模块from bs4 import BeautifulSoup#做一个美味汤soup = BeautifulSoup(html，'html.parser')#输出结果print(soup.prettify())  '''OUT:  # <html># <head>#  <title>#  The Dormouse's story#  </title># </head># <body>#  <p class="title">#  <b>#   The Dormouse's story#  </b>#  </p>#  <p class="story">#  Once upon a time there were three little sisters; and their names were#  <a class="sister" href="http://example.com/elsie" rel="external nofollow" id="link1">#   Elsie#  </a>#  ,#  <a class="sister" href="http://example.com/lacie" rel="external nofollow" id="link2">#   Lacie#  </a>#  and#  <a class="sister" href="http://example.com/tillie" rel="external nofollow" id="link2">#   Tillie#  </a>#  ; and they lived at the bottom of a well.#  </p>#  <p class="story">#  ...#  </p># </body># </html>'''

上一篇：Random 在 Python 中的使用方法

下一篇：python文件操作之批量修改文件后缀名的方法

学习交流

如何重启打印机打印服务

如何重启打印机打印服务...

热门图片

猜你喜欢的新闻

猜你喜欢的关注