awk文本处理总结(中级篇)

2024-09-10 14:18:57

字体：大中小

来源：转载

供稿：网友

awk中级篇
这里顺便介绍一下vi的一个替换命令，现在我们要把example1.txt文本里的空格都替换为“:”冒号这里在vi里使用的命令就是：

%s/ /:/g

这个命令对于使用vi的人来说是用得最多的。我们现在做了一个新的文件example2.txt。

user1:password1:username1:unit1:10
user2:password2:username2:unit2:20
user3:password3:username3:unit3:30

现在我们来做一个awk脚本，之前都是在命令行操作，实际上所有的操作在命令行上是可以都实现的，已我们最经常使用的批量添加用户来开始！

script1.awk

#!/bin/awk -f   # 当文件有可执行权限的时候你可以直接执行
                # ./script1.awk example2.txt
                # 如果没有以上这行可能会出现错误，或者
                # awk -f script1.awk example2.txt 参数f指脚本文件

BEGIN { # “BEGIN{”是awk脚本开始的地方
FS=":"; # FS 是在awk里指分割符的意思
}

{                                # 接下来的“{” 是内容部分
      print "add {";             # 以下都是使用了一个awk函数print
      print "uid=" $1;
      print "userPassword=" $2;
      print "domain=eyou.com" ;
      print "bookmark=1";
      print "voicemail=1";
      print "securemail=1"
      print "storage=" $5;
      print "}";
      print ".";
}                               # “}”    内容部分结束
END {                           # “END{” 结束部分
    print "exit";
}

执行结果
[root@mail awk]# awk -f script1.awk example2.txt
add {
uid=user1
userPassword=password1
domain=eyou.com
bookmark=1
voicemail=1
securemail=1
storage=10
}
.
.
.
.
.
.
exit

文本操作就是更方便一些。

下面给两个返回效果一样的例子
[root@mail awk]# awk -F: '{print $1"@"$2}' example2.txt
[root@mail awk]# awk -F: '{printf "%s@%s",$1,$2}' example2.txt

user1@password1

这里的区别是使用print 和printf的区别，printf格式更自由一些，我们可以更加自由的指定要输出的数据，print会自动在行尾给出空格，而printf是需要给定" "的，如果感兴趣你可以把“”去掉看一下结果。%s代表字符串%d 代表数字，基本上%s都可以处理了因为在文本里一切都可以看成是字符串，不像C语言等其他语言还要区分数字、字符、字符串等。

awk还有一些很好的函数细细研究一下还是很好用的。

这次碰到了一个问题客户有一个用户列表,大概有2w用户，他有一个有趣的工作要做，就是把每个账户目录放到特定的目录下，例如13910011234这个目录要放到139/10/这个目录下，从这里可以看出规律是手机号码的前三位是二级目录名，手机的第3、4为是三级目录名，我们有的就只有一个用户列表，规律找到了我们现在开始想办法处理吧。

example3.txt

13910011234
15920312343
13922342134
15922334422
......

第一步是要找到一个方法来吧，就是要把每一个手机号分开，最初可能你就会想到这个也没有任何间隔，我们怎么用awk分开他们呢？说实话最初我也考虑了20多分钟，后来想起原来学习python的时候有split函数可以分就想找找awk里是不是有类似的函数，man awk 发现substr 这个函数子串,

[root@mail awk]# awk '{print substr($1,1,3)}' example3.txt

[root@mail awk]# awk '{printf "%s/%s",substr($1,1,3),substr($1,4,2)}' example3.txt

[root@mail awk]# awk '{printf "mv %s %s/%s",$1,substr($1,1,3),substr($1,4,2)}' example3.txt

以上的两步的返回自己做一下，最后我们就得到了我们想要的结果。

mv 13910011234 139/10
mv 15920312343 159/20
mv 13922342134 139/22
mv 15922334422 159/22

把这部分输出拷贝到一个shell脚本里，在数据当前目录下执行就可以了！

substr(s, i [, n])      Returns the at most n-character substring of s
                               starting at i. If n is omitted, the rest of s
                               is used.

substr函数解释，s代表我们要处理的字符串，i 是我们从这个字符串的第几个位置上开始，n 是我们从开始的位置取多少个字符。多看看man英文也会有所提高的。

awk有很多有趣的函数如果感兴趣可以自己去查查看，
man awk
String Functions 字符串函数，举几个觉得常用的函数
    length([s])             Returns the length of the string s, or the
                               length of $0 if s is not supplied.
    length 你可以得到字符串的长度，这个是比较常用的一个函数
    split(s, a [, r])       Splits the string s into the array a on the
                               regular expression r, and returns the number of
                               fields. If r is omitted, FS is used instead.
                               The   array a is cleared first.   Splitting
                               behaves   identically   to   field   splitting,
                               described above.

        tolower(str)            Returns a copy of the string str, with all the
                               upper-case characters in str translated to
                               their corresponding lower-case counterparts.
                               Non-alphabetic characters are left unchanged.

        toupper(str)            Returns a copy of the string str, with all the
                               lower-case characters in str translated to
                               their corresponding upper-case counterparts.
                               Non-alphabetic characters are left unchanged.
                                                                                    Time Functions 时间函数，我们最最常用到的是时间戳转换函数

strftime([format [, timestamp]])
                 Formats timestamp according to the specification in format.
                 The timestamp should be of the same form as returned by sys-
                 time().   If timestamp is missing, the current time of day is
                 used. If format is missing, a default format equivalent to
                 the output of date(1) is used. See the specification for the
                 strftime() function in ANSI C for the format conversions that
                 are guaranteed to be available. A public-domain version of
                 strftime(3) and a man page for it come with gawk; if that
                 version was used to build gawk, then all of the conversions
                 described in that man page are available to gawk.

这里举例说明时间戳函数是如何使用的

[root@ent root]# date +%s | awk '{print strftime("%F %T",$0)}'
2008-02-19 15:59:19

我们先使用date命令做一个时间戳，然后再把他转换为时间
还有一些我们现在可能不经常用到的函数，详细内容man awk 自己可以看一下。
Bit Manipulations Functions   二进制函数
Internationalization Functions 国际标准化函数

USER-DEFINED FUNCTIONS      用户也可以自己定义自己的函数，感兴趣自己可以再深入研究一下。

For example:

              function f(p, q,     a, b)   # a and b are local
              {
                   ...
              }

/abc/ { ... ; f(1, 2) ; ... }
DYNAMICALLY LOADING NEW FUNCTIONS 动态加载新函数，这个可能就更高级一些了！

上一篇：awk文本处理总结（入门篇）

下一篇：awk文本处理总结(高级篇)