python-beautifulsoup菜鸟教程

管理员 2020-03-17 14:07

find

table = mysoup.find('table', attrs={'class': 'GridTableContent'})

tbody = table.find('tbody')

find_all 和 find 用法类似

trList = tbody.find_all('tr')

六、搜索文档树

6.1、find_all(name, attrs, recursive, text, **kwargs)

在上面的栗子中我们简单介绍了find_all的使用，接下来介绍一下find_all的更多用法-过滤器。这些过滤器贯穿整个搜索API，过滤器可以被用在tag的name中，节点的属性等。

（1）name参数：

字符串过滤：会查找与字符串完全匹配的内容

a_list = bs.find_all("a")
print(a_list)

正则表达式过滤：如果传入的是正则表达式，那么BeautifulSoup4会通过search()来匹配内容

from bs4 import BeautifulSoup 
import re 
file = open('./aa.html', 'rb') 
html = file.read() 
bs = BeautifulSoup(html,"html.parser") 
t_list = bs.find_all(re.compile("a")) 
for item in t_list: 
   print(item)

列表：如果传入一个列表，BeautifulSoup4将会与列表中的任一元素匹配到的节点返回

t_list = bs.find_all(["meta","link"])
for item in t_list:
    print(item)

方法：传入一个方法，根据方法来匹配

from bs4 import BeautifulSoup 
file = open('./aa.html', 'rb') 
html = file.read() 
bs = BeautifulSoup(html,"html.parser") 
def name_is_exists(tag): 
    return tag.has_attr("name") 
t_list = bs.find_all(name_is_exists) 
for item in t_list: 
    print(item)

（2）kwargs参数：

from bs4 import BeautifulSoup 
import re 
file = open('./aa.html', 'rb') 
html = file.read() 
bs = BeautifulSoup(html,"html.parser") 
# 查询id=head的Tag
t_list = bs.find_all(id="head") print(t_list) 
# 查询href属性包含ss1.bdstatic.com的Tag
t_list = bs.find_all(href=re.compile("http://news.baidu.com")) 
print(t_list) 
# 查询所有包含class的Tag(注意：class在Python中属于关键字，所以加_以示区别)
t_list = bs.find_all(class_=True) 
for item in t_list: 
    print(item)

（3）attrs参数：

并不是所有的属性都可以使用上面这种方式进行搜索，比如HTML的data-*属性：

t_list = bs.find_all(data-foo="value")

如果执行这段代码，将会报错。我们可以使用attrs参数，定义一个字典来搜索包含特殊属性的tag：

t_list = bs.find_all(attrs={"data-foo":"value"})
for item in t_list:
    print(item)

（4）text参数：

通过text参数可以搜索文档中的字符串内容，与name参数的可选值一样，text参数接受字符串，正则表达式，列表

from bs4 import BeautifulSoup 
import re 
file = open('./aa.html', 'rb') 
html = file.read() 
bs = BeautifulSoup(html, "html.parser") 
t_list = bs.find_all(attrs={"data-foo": "value"}) 
for item in t_list: 
    print(item) 
t_list = bs.find_all(text="hao123") 
for item in t_list: 
    print(item) 
t_list = bs.find_all(text=["hao123", "地图", "贴吧"]) 
for item in t_list: 
    print(item) 
t_list = bs.find_all(text=re.compile("\d")) 
for item in t_list: 
    print(item)

当我们搜索text中的一些特殊属性时，同样也可以传入一个方法来达到我们的目的：

def length_is_two(text):
    return text and len(text) == 2
t_list = bs.find_all(text=length_is_two)
for item in t_list:
    print(item)

（5）limit参数：

可以传入一个limit参数来限制返回的数量，当搜索出的数据量为5，而设置了limit=2时，此时只会返回前2个数据

from bs4 import BeautifulSoup 
import re 
file = open('./aa.html', 'rb') 
html = file.read() 
bs = BeautifulSoup(html, "html.parser") 
t_list = bs.find_all("a",limit=2) 
for item in t_list: 
    print(item)

find_all除了上面一些常规的写法，还可以对其进行一些简写：

# 两者是相等的
# t_list = bs.find_all("a") => t_list = bs("a")
t_list = bs("a") # 两者是相等的
# t_list = bs.a.find_all(text="新闻") => t_list = bs.a(text="新闻")
t_list = bs.a(text="新闻")

6.2、find()

find()将返回符合条件的第一个Tag，有时我们只需要或一个Tag时，我们就可以用到find()方法了。当然了，也可以使用find_all()方法，传入一个limit=1，然后再取出第一个值也是可以的，不过未免繁琐。

from bs4 import BeautifulSoup 
import re 
file = open('./aa.html', 'rb') 
html = file.read() 
bs = BeautifulSoup(html, "html.parser") 
# 返回只有一个结果的列表
t_list = bs.find_all("title",limit=1) 
print(t_list) 
# 返回唯一值
t = bs.find("title") 
print(t) 
# 如果没有找到，则返回None
t = bs.find("abc") print(t)

从结果可以看出find_all，尽管传入了limit=1，但是返回值仍然为一个列表，当我们只需要取一个值时，远不如find方法方便。但

是如果未搜索到值时，将返回一个None

在上面介绍BeautifulSoup4的时候，我们知道可以通过bs.div来获取第一个div标签，如果我们需要获取第一个div下的第一个div，

我们可以这样：

t = bs.div.div
# 等价于
t = bs.find("div").find("div")

联系站长

站长微信:xiaomao0055

站长QQ:14496453

按分类分组

java(185)

python(33)

MySQL(29)

layUI(11)

公众号相关(5)

MybatisPlush(6)

Cocos Creator音爆小游戏_01(2)

Cocos Creator零基础游戏开发视频教程_01(34)

Cocos Creator计算器(4)

CocosCreator 飞机大战(29)

CocosCreator 拼图游戏(16)

CocosCreator棋牌炸金花(22)

CocosCreator 3D出租车-官方(19)

Cocos 3D 懒猫跑酷(13)

MyBatis-Plus教程(1)

vue3UI-ElementPlush 使用文档(21)

CocosCreator3D基础(27)

SpringBoot资料(109)

Linux系统(10)

windows系统(2)

VUE(35)

eclipse-tomcat(15)

我是医生(12)

爬虫相关(2)

小程序(145)

小游戏(4)

小程序-商城-shop-01(6)

手机(1)

docker(3)

JS/CSS/HTML(169)

电脑知识(24)

typescript教程(4)

python-selenium(6)

python-BeautifulSoup(1)

java-selenium(4)

jsoup解析html(1)

IDEA(27)

物联网(10)

excel技巧(6)

抖音小程序(1)

微信小程序，基础教程(1)

excel(2)

视频制作(12)

古诗、古文(3)

电影(1)

秘术(1)

个人分类(98)

其它分类(7)

实验室软件(3)

基金证券(1)

歌曲(1)

按日期分组

2026年05月(7)

2026年04月(8)

2026年03月(11)

2026年02月(10)

2026年01月(16)

2025年12月(14)

2025年11月(11)

2025年10月(24)

2025年09月(10)

2025年08月(30)

2025年07月(19)

2025年06月(18)

2025年05月(19)

2025年04月(18)

2025年03月(7)

2025年02月(8)

2025年01月(11)

2024年12月(12)

2024年11月(12)

2024年10月(10)

2024年09月(2)

2024年08月(5)

2024年06月(1)

2024年05月(4)

2024年04月(8)

2024年03月(14)

2024年02月(11)

2024年01月(12)

2023年12月(18)

2023年11月(10)

2023年10月(8)

2023年09月(5)

2023年07月(12)

2023年05月(13)

2023年04月(24)

2023年03月(14)

2023年02月(5)

2023年01月(8)

2022年12月(11)

2022年11月(17)

2022年10月(39)

2022年09月(12)

2022年08月(6)

2022年07月(1)

2022年06月(5)

2022年05月(10)

2022年04月(10)

2022年03月(15)

2022年02月(8)

2022年01月(11)

2021年12月(9)

2021年11月(59)

2021年10月(46)

2021年09月(37)

2021年08月(68)

2021年07月(15)

2021年06月(5)

2021年05月(8)

2021年04月(10)

2021年03月(12)

2021年02月(20)

2021年01月(4)

2020年12月(8)

2020年11月(8)

2020年10月(16)

2020年09月(12)

2020年08月(11)

2020年07月(20)

2020年06月(23)

2020年05月(8)

2020年04月(14)

2020年03月(15)

2020年02月(18)

2020年01月(6)

2019年12月(12)

2019年11月(18)

2019年10月(18)

2019年09月(30)

2019年08月(19)

2019年07月(21)

2019年06月(20)