linux最快的文本搜索神器ripgrep(grep的最好代替者)

2024-08-28 00:03:59

字体：大中小

来源：转载

供稿：网友

前言

说到文本搜索工具，大家一定会知道 grep, 它是 linux 最有用并最常用的工具之一。
但如果要再一个大的工程项目中搜索某个关键词，大家也一定知道它比较耗时。
所以就有了很多替代工具，之前最出名的是 Ack，Ag
而最近又有了新的替代者 Ripgrep, 这个工具和 Ack/Ag 一样都使用了多线程的方法，但 rg 比它们更快

简介

ripgrep 是一个以行为单位的搜索工具，它根据提供的 pattern 递归地在指定的目录里搜索。它是由 Rust 语言写成，相较与同类工具，它的特点就是无与伦比地快。
几个特点如下：

自动递归搜索（grep 需要-R）自动忽略.gitignore 中的文件以及 2 进制文件可以搜索指定文件类型（rg -tpy foo限定 python 文件， rg -Tjs foo排除 js 文件) 支持大部分 grep 的 feature(常用的都有) 支持各种文件编译（UTF-8， UTF-16， latin-1, GBK, EUC-JP, Shift_JIS 等等）支持搜索常见压缩文件(gzip, xz, lzma, bzip2, lz4) 自动高亮匹配的结果更少的命令名称 rg (grep 是四个字符) 不支持多行搜索和花哨的正则

安装 ripgrep

先安装 RUST

curl https://sh.rustup.rs -sSf | sh

然后一路 enter 就好了

用 RUST 安装 rigpre

git clone https://github.com/BurntSushi/ripgrepcd ripgrepcargo build --releasecp ./target/release/rg /usr/local/bin/

最后一步根据你的情况把它放到某个在 PATH 里的路径里

使用

搜索结果展示

用法总体格式

USAGE:  rg [OPTIONS] PATTERN [PATH ...]  rg [OPTIONS] [-e PATTERN ...] [-f PATTERNFILE ...] [PATH ...]  rg [OPTIONS] --files [PATH ...]  rg [OPTIONS] --type-list  command | rg [OPTIONS] PATTERN

输入参数

ARGS:  <PATTERN>      A regular expression used for searching. To match a pattern beginning with a      dash, use the -e/--regexp flag.      For example, to search for the literal '-foo', you can use this flag:        rg -e -foo      You can also use the special '--' delimiter to indicate that no more flags      will be provided. Namely, the following is equivalent to the above:        rg -- -foo  <PATH>...      A file or directory to search. Directories are searched recursively. Paths specified on      the command line override glob and ignore rules.

options	Description	other
-A, --after-context <NUM>	显示匹配内容后的<NUM>行	会覆盖--context
-B, --before-context <NUM>	显示匹配内容前的<NUM>行	会覆盖--context
-b, --byte-offset	显示匹配内容在文件中的字节偏移	和-o 一起使用，只打印偏移
-s, --case-sensitive	大小写敏感	会覆盖-i(ignore case), -S(smart case)
--color <WHEN>	什么时候使用颜色，默认 auto	如果--vimgre 被使用，那么默认值是 never
	可选项有： never, auto, always, ansi
--colors <COLOR_SPEC>...	设定输出颜色：	color: red, blue, green, cyan
	`{type}:{attribute}:{value}`	magenta, yellow, white, black
	`{type}`: path, line, column, match	style: nobold, bold, nointense
	`{attribute}`: fg, bg, style	intense, nounderline, underline
	`{value}`: a color or a text style	Example:
	`{type}:none`会清空{type}的颜色设定	rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo
		扩展颜色集可以被`{value}`使用，如果终端支持 ANSI color
		描述方法是'x'(256-color) 或 'x,x,x'(24-bit true color)
		x 是 0-255 之间的数值,默认是 10 进制， 0x 前缀是 16 进制
		比如: rg --colors 'match:bg:0,128,255'
		或者等价的：rg --colors 'match:bg:0x0,0x80,0xFF'
		在使用 extended color code 时 intense 和 nointense 是无效的
--column	第一次匹配所在列数(从 1 开始)	能够被--no-column 取消掉
-C, --context <NUM>	显示匹配内容的前面和后面的<NUM>行	它会覆盖-B 和-A 选项
--context-separator <SEPARATOR>	在输出中用来分隔非连续的行	x7F 或t 可被使用，默认是--
-c, --count	只显示匹配的行数	如果只有一个文件给 ripgrep，那只打印匹配行数
		可以用--with-filename 来强制打印文件名
		它会覆盖--count-matches 选项
--count-matches	只显示匹配的次数	可以用--with-file 来强制在只有一个文件时也输出文件名
--debug	显示调试信息
--dfa-size-limit <NUM+SUFFIX?>	regex DFA 的上限，默认 10M
-E, --encoding <ENCODING>	描述文本编码, 默认是 auto	https://encoding.spec.whatwg.org/#concept-encoding-get
-f, --file <PATTERNFILE>...	从文件中读入 pattern, 一行一 pattern	可以被多次使用或和-e 一起组合使用，所以有组合会被匹配
--files	打印所有将被搜索的文件	以`rg <options> --files [PATH...]`方式使用，不能加 pattern
-l, --files-with-matches	只打印有匹配的文件名	覆盖--files-without-match
--files-without-match	只打印无匹配的文件名	覆盖 --file-with-matches
-F, --fixed-strings	把 pattern 当成常规文字而非 regex	可以用--no-fixed-strings 来禁止这个选项
-L, --follow	会递归搜索链接，默认关闭	可以用--no-follow 来关闭
-g, --glob <GLOB>...	通配符文件或文件夹，可以用!来取反	可以多次使用，会匹配.gitignore 的通配符规则
-h, --help	打印帮助信息
--heading	打印文件名到匹配内容的上方而不是同一行	这是默认行为，可以用--no-heading 来关闭
--hidden	搜索隐藏文件和文件夹	默认忽略, 可用--no-hidden 关闭
--iglob <GLOB>...	同--glob, 但这个大小写不敏感
-i, --ignore-case	pattern 大小写不敏感	可通过-s/--case-sensitive 或-S/--smart-case 覆盖这个选项
--ignore-file <PATH>...	忽略路径，格式同.gitignore, 可多个	多个--ignore-file 标记时，后面优先级高
	在命令上时，使用-g 来达到同样效果
-v, --invert-match	反向匹配
-n, --line-number	显示文件行数，默认打开
-x, --line-regexp	只显示整行都匹配 pattern 的行	会覆盖--word-regexp
-M, --max-columns <NUM>	不打印长于<NUM>列的匹配行
-m, --max-count <NUM>	限制一个文件最多<NUM>行匹配
--max-depth <NUM>	限制文件夹递归搜索深度	`rg --max-depth 0 dir/`不执行任何搜索
--max-filesize <NUM+SUFFIX?>	忽略大于<NUM>byte 的文件	suffix 可以是 K, M，G, 默认是 byte
--mmap	尽量使用 memory maps, 默认行为	目前它不支持所有选项, 用--no-mmap 来关闭
--no-config	不读取 conf 文件, 忽略 RIPGREP_CONFIG_PATH
--no-filename	不要打印匹配文件名
--no-heading	在每个匹配行前都打印文件名
--no-ignore	取消 ignore 文件，如.gitignore, .ignore	可以用--ignore 关闭
--no-ignore-global	取消对全局的 ignore 文件读取	如$HOME/.config/git/ignore
--no-ignore-messages	取消解析.ignroe, .gitignore 文件相关错误	可通过--ignore-messages 关闭
--no-ignore-parent	不读取父文件夹里的.gitignore, .ignore 文件	可通过 --ignore-parent 关闭
--no-ignore-vcs	只全能.ignore 文件	可通过--ignore-vcs 关闭
-N, --no-line-number	不打印匹配行数
--no-messages	不打印打开和读取文件相关错误
-0, --null	在打印的文件路径后加一个 NUL 字符	对于 xargs 非常有用
-o, --only-matching	只打印匹配的内容，而不是整行
--passthru	打印匹配和不匹配的行
--path-separator <SEPARATOR>	路径分隔符，在 linux 上默认是/
--pre <COMMAND>	用<COMMAND>处理文件，并将结果给 rg	可能有巨大的性能惩罚
	例如
	case "$1" in
	*.pdf)
	exec pdftotext "$1" -
	;;
	*)
	case $(file "$1") in
	_Zstandard_)
	exec pzstd -cdq
	;;
	*)
	exec cat
	;;
	esac
	;;
	esac
-p, --pretty	`--color always --heading --line-number`
-q, --quiet	不打印到 stdout, 如果匹配发现，停止 rg	当 rg 用于 exit 代码时非常有用
--regex-size-limit <NUM+SUFFIX?>	编译 regex 的上限
-e, --regexp <PATTERN>...	使用正则来匹配	可多次使用这个选项，打印匹配任何 pattern 的行
		可以用于搜索-开头的 pattern，如`rg -e -foo`
-r, --replace <REPLACEMENT_TEXT>	用相应文件代替匹配内容打印出来	组序号($5)可以被使用
-z, --search-zip	在 gz,bz2,xz,lzma,lz4 文件类型中搜索	可通过--no-search-zip 关闭
-S, --smart-case	如果全小写，则大小写不敏感，否则敏感	可通过-s/--case-sensitive 和-i/--ignore-case 关闭
--sort-files	根据文件路径排序输出结果	会关闭并行搜索线程
--stats	打印出统计结果
-a, --text	搜索二进制文件	可通过--no-text 关闭
-j, --threads <NUM>	大约使用的线程数
-t, --type <TYPE>...	只搜索某种文件类型	可通过--type-lsit 来列出支持的文件类型
--type-add <TYPE_SPEC>...	添加文件类型	如`rg --type-add 'foo:*.foo' -tfoo PATTERN`
	也可以用来创建某种包含多种文件类型的规则	--type-add 'src:include:cpp,py,md'
--type-clear <TYPE>...	清除默认的文件类型
--type-list	列出所有内置文件类型
-T, --type-not <TYPE>...	不要搜索某种文件类型
-u, --unrestricted	-u 搜索.gitignore 里的文件, -uu 搜索隐藏文件	-uuu 搜索二进制文件
-V, --version	打印版本信息
--vimgrep	每一次匹配打印一行	一行有多次匹配会打印多行
-H, --with-filename	打印匹配的文件路径，默认	可通过--no-filename 关闭
-w, --word-regexp	把 pattern 作为单独单词匹配，与< >等价