首页 > 编程 > Python > 正文

Python统计纯文本文件中英文单词出现个数的方法总结【测试可用

2020-02-15 22:31:29
字体:
来源:转载
供稿:网友

本文实例讲述了Python统计纯文本文件中英文单词出现个数的方法。分享给大家供大家参考,具体如下:

第一版: 效率低

# -*- coding:utf-8 -*-#!python3path = 'test.txt'with open(path,encoding='utf-8',newline='') as f:  word = []  words_dict= {}  for letter in f.read():    if letter.isalnum():      word.append(letter)    elif letter.isspace(): #空白字符 空格 /t /n      if word:        word = ''.join(word).lower() #转小写        if word not in words_dict:          words_dict[word] = 1        else:          words_dict[word] += 1        word = []#处理最后一个单词if word:  word = ''.join(word).lower() # 转小写  if word not in words_dict:    words_dict[word] = 1  else:    words_dict[word] += 1  word = []for k,v in words_dict.items():  print(k,v)

运行结果:

we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1

第二版:

缺点:遇到大文件要一次读入内存,性能不好

# -*- coding:utf-8 -*-#!python3import repath = 'test.txt'with open(path,'r',encoding='utf-8') as f:  data = f.read()  word_reg = re.compile(r'/w+')  #word_reg = re.compile(r'/w+/b')  word_list = word_reg.findall(data)  word_list = [word.lower() for word in word_list] #转小写  word_set = set(word_list) #避免重复查询  # words_dict = {}  # for word in word_set:  #   words_dict[word] = word_list.count(word)  # 简洁写法  words_dict = {word: word_list.count(word) for word in word_set}  for k,v in words_dict.items():    print(k,v)

运行结果:

on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1

发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表