Introductions

Single-dimensional Array

An array is a table of values called elements. The elements of an array are distinguished by their indices. Indices may be either numbers or strings. –Page172

The awk language provides one-dimensional arrays for storing groups of related strings or numbers. Every awk array must have a name. Array names have the same syntax as variable names; any valid variable name would also be a valid array name. But one name cannot be used in both ways (as an array and as a variable) in the same awk program. –Page173

Arrays in awk superficially resemble arrays in other programming languages, but there are fundamental differences. In awk, it isn’t necessary to specify the size of an array before starting to use it. Additionally, any number or string, not just consecutive integers, may be used as an array index. –Page173

awk中，數組長度不用預先定義，數組索引可以是數值或字符串

A reference to an element that does not exist automatically creates that array element, with the null string as its value. –page176

Multidimensional Array

A multidimensional array is an array in which an element is identified by a sequence of indices instead of a single index. For example, a two-dimensional array requires two indices. The usual way (in many languages, including awk) to refer to an element of a two-dimensional array named grid is with grid[x,y] . –page185

Multidimensional arrays are supported in awk through concatenation of indices into one string. awk converts the indices into strings and concatenates them together, with a separator between them. This creates a single string that describes the values of the separate indices. The combined string is used as a single index into an ordinary, one-dimensional array. The separator used is the value of the built-in variable SUBSEP .–page185

The default value of SUBSEP is the string \034 , which contains a nonprinting character that is unlikely to appear in an awk program or in most input data. The usefulness of choosing an unlikely character comes from the fact that index values that contain a string matching SUBSEP can lead to combined strings that are ambiguous. –page186

There is no special for statement for scanning a “multidimensional” array. There cannot be one, because, in truth, awk does not have multidimensional arrays or elements—there is only a multidimensional way of accessing an array. –page187

However, if your program has an array that is always accessed as multidimensional, you can get the effect of scanning it by combining the scanning for statement with the built-in split() function. It works in the following manner: –page187

for (combined in array) {
split(combined, separate, SUBSEP)
...
}


This sets the variable combined to each concatenated combined index in the array, and splits it into the individual indices by breaking it apart where the value of SUBSEP appears. The individual indices then become the elements of the array separate .–page187

Attentions

awk中測試元素是否存在，假設數組名arr，查選的索引名爲lemp

awk中的多維數組本質是一維數組，通過separator（分隔符）將indice聯結成一個字符串，使用時再還原出來。而分隔符就是內置變量SUBSEP的值，默認是\034

Example

 username | count | sum
---------------------------
flying     77       960724
rtkit      1        716
colord     1        1860
postfix    2        25721
root       154      450382
apache     10       29567
...
...
...
zabbix     33       105059
dbus       1        732
============================


Test1

[flying@lemp ~]$ps -ef | awk -v FS=' ' '$1~/^[^UID]/{user[$1,"count"]+=1;user[$1,"sum"]+=$2}END{for(i in user){split(i,j,SUBSEP);print j[1],user[j[1],"count"],user[j[1],"sum"]}}' | sort apache 10 29567 apache 10 29567 avahi 2 1418 avahi 2 1418 chrony 1 741 chrony 1 741 colord 1 1860 colord 1 1860 dbus 1 732 dbus 1 732 flying 78 988545 flying 78 988545 libstor+ 1 698 libstor+ 1 698 mysql 1 1516 mysql 1 1516 nobody 1 1787 nobody 1 1787 polkitd 1 773 polkitd 1 773 postfix 2 25721 postfix 2 25721 root 156 505086 root 156 505086 rtkit 1 716 rtkit 1 716 zabbix 33 105059 zabbix 33 105059 [flying@lemp ~]$


Test2

[flying@lemp ~]$ps -ef | awk '/^[^UID]/{user[$1,"count"]+=1;user[$1,"sum"]+=$2}END{for(i in user){split(i,j,SUBSEP);print j[1],j[2]}}' | sort
apache count
apache sum
avahi count
avahi sum
chrony count
chrony sum
colord count
colord sum
dbus count
dbus sum
flying count
flying sum
libstor+ count
libstor+ sum
mysql count
mysql sum
nobody count
nobody sum
polkitd count
polkitd sum
postfix count
postfix sum
root count
root sum
rtkit count
rtkit sum
zabbix count
zabbix sum
[flying@lemp ~]$ 發現j[2]分別代表了countsum，故對應的UID會出現2次。 awk中是對indice進行拼接組成字符串，不能對值進行該操作。暫時沒有想到有效的解決方案，使用uniq去重 Test3 [flying@lemp ~]$ ps -ef | awk -v FS=' ' '$1~/^[^UID]/{user[$1,"count"]+=1;user[$1,"sum"]+=$2}END{for(i in user){split(i,j,SUBSEP);print j[1],user[j[1],"count"],user[j[1],"sum"]}}' | sort | uniq
apache 10 29567
avahi 2 1418
chrony 1 741
colord 1 1860
dbus 1 732
flying 79 1016953
libstor+ 1 698
mysql 1 1516
nobody 1 1787
polkitd 1 773
postfix 2 25721
root 154 450433
rtkit 1 716
zabbix 33 105059
[flying@lemp ~]$ 但這樣沒有辦法加表頭和表尾。 Test4 使用!a[$0]++

[flying@lemp ~]$ps -ef | awk -v FS=' ' '$1~/^[^UID]/{user[$1,"count"]+=1;user[$1,"sum"]+=$2}BEGIN{print " username | count | sum\n---------------------------"}END{for(i in user){split(i,j,SUBSEP);printf "%-10s %-8d %-8d\n",j[1],user[j[1],"count"],user[j[1],"sum"]}}END{print "============================"}' | awk '!a[$0]++'
---------------------------
flying     81       1076787
rtkit      1        716
colord     1        1860
postfix    2        25721
root       155      482698
apache     10       29567
chrony     1        741
libstor+   1        698
avahi      2        1418
zabbix     33       105059
polkitd    1        773
dbus       1        732
nobody     1        1787
mysql      1        1516
============================
[flying@lemp ~]$ 成功實現去重，但使用了2次awk命令，感覺不是很優雅，且還未實現按指定字段排序，暫且先這樣。 關於!a[$0]++的含義，稍後會整理一篇Blog介紹。

Change Logs

• 2016.03.06 22:55 Sun Asia/Beijing
• 初稿完成

• Note Time: 2016.03.06 22:55 Sun
• Note Location: Asia/Beijing
• Writer: lempstacker