在python中使用正则表达式查找可嵌套字符串组

2020-01-04 16:38:50

字体：大中小

来源：转载

供稿：网友

在网上看到一个小需求，需要用正则表达式来处理。原需求如下：

找出文本中包含”因为……所以”的句子，并以两个词为中心对齐输出前后3个字，中间全输出，如果“因为”和“所以”中间还存在“因为”“所以”，也要找出来，另算一行，输出格式为：

行号前面3个字 *因为* 全部 &所以& 后面3个字(标点符号算一个字)

2 还不是 *因为* 这里好， &所以& 没有人

实现方法如下：

#encoding:utf-8import osimport redef getPairStriList(filename):  pairStrList = []  textFile = open(filename, 'r')  pattern = re.compile(u'.{3}/u56e0/u4e3a.*/u6240/u4ee5.{3}') #u'/u56e0/u4e3a和u'/u6240/u4ee5'分别为“因为”和“所以”的utf8码  for line in textFile:    utfLine = line.decode('utf8')    result = pattern.search(utfLine)    while result:      resultStr = result.group()      pairStrList.append(resultStr)      result = pattern.search(resultStr,2,len(resultStr)-2)  #对每个字符串进行格式转换和拼接    for i in range(len(pairStrList)):    pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'/u56e0/u4e3a',u' */u56e0/u4e3a* ',1) + pairStrList[i][5:]    pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'/u6240/u4ee5',u' &/u6240/u4ee5& ',1)    pairStrList[i] = str(i+1) + ' ' + pairStrList[i]  return pairStrList  if __name__ == '__main__':  pairStrList = getPairStriList('test.txt')  for str in pairStrList:    print str

PS：下面看下python里使用正则表达式的组嵌套

由于组本身是一个完整的正则表达式，所以可以将组嵌套在其他组中，以构建更复杂的表达式。下面的例子，就是进行组嵌套的例子：

#python 3.6 #蔡军生  #http://blog.csdn.net/caimouse/article/details/51749579 # import re def test_patterns(text, patterns):   """Given source text and a list of patterns, look for   matches for each pattern within the text and print   them to stdout.   """   # Look for each pattern in the text and print the results   for pattern, desc in patterns:     print('{!r} ({})/n'.format(pattern, desc))     print(' {!r}'.format(text))     for match in re.finditer(pattern, text):       s = match.start()       e = match.end()       prefix = ' ' * (s)       print(         ' {}{!r}{} '.format(prefix,                    text[s:e],                    ' ' * (len(text) - e)),         end=' ',       )       print(match.groups())       if match.groupdict():         print('{}{}'.format(           ' ' * (len(text) - s),           match.groupdict()),         )     print()   return

例子：

#python 3.6 #蔡军生  #http://blog.csdn.net/caimouse/article/details/51749579 # from re_test_patterns_groups import test_patterns test_patterns(   'abbaabbba',   [(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')], )

结果输出如下：

'a((a*)(b*))' (a followed by 0-n a and 0-n b) 'abbaabbba' 'abb'    ('bb', '', 'bb')   'aabbb'  ('abbb', 'a', 'bbb')     'a' ('', '', '')

总结

以上所述是小编给大家介绍的在python中使用正则表达式查找可嵌套字符串组，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对VEVB武林网网站的支持！

注：相关教程知识阅读请移步到python教程频道。

上一篇：python爬虫之BeautifulSoup 使用select方法详解

下一篇：详解python里使用正则表达式的分组命名方式

学习交流

解决内存不足妙方

解决内存不足妙方...

热门图片

猜你喜欢的新闻

猜你喜欢的关注

新闻热点

雷军2020新年全员信：“5G+AIoT”五年投500亿

2020-01-03 21:43:53

春运售票超3亿张！售票总量再创历史新高

2020-01-03 20:41:46

Windows10市场份额全球第一微软是否再无敌手？

2020-01-03 20:31:47

比尔盖茨一次错误，付出2.8万亿的代价

2020-01-02 08:44:34

长江迎来最长禁渔期：十年禁渔，方才有鱼

2020-01-02 08:28:02

快手封杀淘宝？回应：系统升级，淘宝商品暂无法审核

2020-01-01 22:50:39

疑难解答

图片精选

网友关注