AWK
language
#!/usr/bin/awk -f END { print "Hello, World!"; }
- Authors
- Aho Alfred Vaino
- Weinberger Peter Jay
- Kernighan Brian Wilson
- 2023
- –csv, supports fields surrounded by double quotes
- https://www.gnu.org/software/gawk/manual/html_node/Conversion gawk always uses the period (.) as the decimal point unless told explicitly to use the local LC_NUMERIC –posix –use-lc-numeric (-N)
sometimes not enforcing variables to be local can cause weird issues. early return, should happen as soon as possible otherwise this function will keep looping… If I move the if/return0 to the top it works just fine OR if I make "middle" a local variable
function binarySearch(target, left, right) { middle = int((left+right)/2) print "l:", left, "r:", right, "m:", middle, "n[m]="numbers[middle] if (left >= right) { return 0 } if (numbers[middle] > target) binarySearch(target, left, middle-1) if (numbers[middle] < target) binarySearch(target, middle+1, right) return numbers[middle] == target }
- length($0)
= 0 { print "this is an empty record=
" } END { if (NR == 0) print "means that we didn't process any record" } - || and && return booleans. AKA 0 or 1. They do NOT return the truthy value.
- Can redefine NF=0 at END and then add new $(++NF)=??? to later just print
- falsey: "", 0, undefined variables
{ print "expression" > "filename" } { print "expression" | "command" } function add_tree (number) { # local variables can be declared here too, like &aux return number + 3 } { print add_tree(36) }
- Unique at the time due
- being a scripting language
- having associative arrays
can assign a value on a if
if (disjoint = r[2] <= m1 || m2 <= r[1]) continue
network
rossetta - web server
https://rosettacode.org/wiki/Hello_world/Web_server
#!/usr/bin/gawk -f BEGIN { RS = ORS = "\r\n" HttpService = "/inet/tcp/8080/0/0" Hello = "<HTML><HEAD>" \ "<TITLE>A Famous Greeting</TITLE></HEAD>" \ "<BODY><H1>Hello, world</H1></BODY></HTML>" Len = length(Hello) + length(ORS) print "HTTP/1.0 200 OK" |& HttpService print "Content-Length: " Len ORS |& HttpService print Hello |& HttpService while ((HttpService |& getline) > 0) continue; close(HttpService) }
style
- guide https://github.com/soimort/translate-shell/wiki/AWK-Style-Guide
- CamelCase, variables
- lowercase, functions
- 4 spaces or 4 #### for local variables in a function
- case have a default branch
- use "IF" and not "ELSE IF" if the previous branch uses a "RETURN"
- use () before pipes |
Types (will be automaticaly coerced when needed)
https://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html
- Strings
- index start at 1
- Numbers
- Arrays
- index start at 0
- 1D
- for strings or numbers
- no need to be declared
- ALWAYS asssociative (aka hashtables)
- for (variable in array)
- delete array[subscript]
- use this to coerce into an array in a body's function "" in arr
- https://www.gnu.org/software/gawk/manual/html_node/Controlling-Array-Traversal.html
- comp_func(i1, v1, i2, v2) < 0 Index i1 comes before index i2
- comp_func(i1, v1, i2, v2) == 0 Indices i1 and i2 come together
- comp_func(i1, v1, i2, v2) > 0 Index i1 comes after in2
- Set the order an already created array would be presented on a forIn
- PROCINFO["sorted_in"] = "afunctionname" like comp_func(index1, value1, index2, value2)
- PROCINFO["sorted_in"] = "@val_num_asc"
- PROCINFO["sorted_in"] = "@val_num_desc"
- PROCINFO["sorted_in"] = "@val_str_asc"
- PROCINFO["sorted_in"] = "@val_str_desc"
- PROCINFO["sorted_in"] = "@ind_num_asc"
- PROCINFO["sorted_in"] = "@ind_num_desc"
- PROCINFO["sorted_in"] = "@ind_str_asc"
- PROCINFO["sorted_in"] = "@ind_str_desc"
built-in variables
- RS="^$" reads the whole file as a single record
- FPAT https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
- For csv, FPAT = "([^,]+)|(\"[^\"]+\")"
- Instead of using FS to specify what the fields are not
- We use this to specify what are the fields, in the form of a regular expression.
meaning |
default |
|
---|---|---|
FPAT | regex of what each field contains | - |
NF | numer of fields in line | - |
NR | number of records (aka lines) read so far | - |
FNR | number of records read so far, in curr file | - |
FS | controls the input field separator | " " |
RS | controls the input record separator | "\n" |
OFS | output field separator | " " |
ORS | output record separator | "\n" |
OFMT | output format for numbers | "%.6g" |
ARGC | number of cli arguments | - |
ARGV | array of cli arguents | - |
ENVIRON | array of environment variables | |
RLENGTH | length of string matched by match function | - |
RSTART | start of string matched by match function | - |
FILENAME | name of current input file | - |
SUBSEP | subscript separator | "\034" |
built-in functions
TIME
https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
mktime | DATESTR, UTC? | given DATESTR, timestamp in seconds since epoch |
strftime | FMT, TIMESTAMP, UTC? | |
systime | - | now, timestamp in seconds since epoch |
BITWISE
https://www.gnu.org/software/gawk/manual/html_node/Bitwise-Functions.html
<r> | returns |
---|---|
and(v1,v2,…) | |
xor(v1,v2,…) | |
or(v1,v2,…) | |
compl(val) | complement |
lshift(val, count) | val left shifted by count bits |
rshift(val, count) | val right shifter by count bits |
ARRAY
<r> | returns | does |
---|---|---|
asort(SRC,DST) | number of elements in SRC | sort by value, DST has idx=numeric val=old_value |
asorti(SRC,DST) | number of elements in SRC | sort by index, DST has idx=numeric val=old_index |
isarray(arr) | boolean |
MATH
https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html
returns | |
---|---|
atan2(y,x) | arctangent of y/x in -x to x range |
cos(x) | cosine of x, with x in radians |
sin(x) | sine of x, with x in radians |
exp(x) | |
log(x) | ntural base e logarithm of x |
sqrt(x) | |
int(x) | integer part of x, truncated |
rand() | random nuber r, 0 <= r < 1 |
srand(x) | x is new seed for rand() |
STRING
https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html
returns | does | |
---|---|---|
sub(r,s) | number of subst made | substitute one r for s in $0 |
sub(r,s,t) | number of subst made | substitute one r for s in t |
gsub(r,s) | number of subst made | substitute all r for s in $0 |
gsub(r,s,t) | number of subst made | substitute all r for s in t |
gensub(r,s,h) | copy of s modified | substitute h'th instance of r by s in $0 |
gensub(r,s,h,t) | copy of s modified | substitute h'th instance of r by s in t |
substr(s,start) | substring of s | |
substr(s,start,len) | substring of s | |
split(s,a) | number of fields | stores the pieces in array a |
split(s,a,fs) | number of fields | stores the pieces in array a |
length() | number of chars in $0 | |
length(s) | number of chars in s | |
index(s,t) | 0 or n position of t in s | |
match(s,r) | index or 0 | test if s contains r, sets RSTART and RLENGTH |
match(s,r,a) | … sets a to portions of s that match r | |
[0] = whole matched part of s | ||
[N, "start"] = starting index of match | ||
[N, "length"] = length of match | ||
sprintf(fmt, …) | formated string | |
strtonum(s) | ||
tolower(s) | lowercased s | |
toupper(s) | uppercased s |
operators
= += -= *= /= %= ^= | Assigments |
?: | Ternary operator |
in | Array membership |
~ !~ | Matching |
control flow
- exit
- on a normal rule, still runs END, but not ENDFILE
- on BEGIN , still runs END
- on END , stops
exit | goes immediately to the END action |
exit expression | |
next | skips to the next line of input |
output statement
close(filename) | break connection between print and filename |
close(command) | break connection between print and command |
system(command) | execute command |
getline
https://www.gnu.org/software/gawk/manual/html_node/Getline.html
getline | reads next input record | NF, NR, FNR, RT, $0 |
getline var | reads n.i.r. into var | NR, FNR, RT |
getline < file | reads n.i.r. from file | NF, RT, $0 |
getline var < file | reads n.i.r. from file into var | - |
"cmd" ¦ getline | reads a single line of cmd into awk | NF, RT, $0 |
"cmd" ¦ getline var | reads a single line of cmd into var | RT |
"cmd" ¦& getline | reads from a two-way pipe | NF, RT, $0 |
"cmd" ¦& getline var | reads from a two-way pipe into var | RT |
NOTE: call close("cmd")
on the non two-way pipes
format strings
- https://www.gnu.org/software/gawk/manual/html_node/Control-Letters.html
- https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html
- %+-width.prec(?)
description | |
---|---|
%f, %F | float |
%a, %A | float hexa |
%g, %G | float or scientific notation |
%d, %i | decimal integer |
%e, %E | scientific notation |
%o | unsigned octal |
%u | unsigned decimal integer |
%x, %X | unsigned hexadecimal integer |
%c | numbers as character |
%s | string |
%% | literal "%" |
extensions
- @include "join"
- @include "assert" assert(BOOLEAN, "Reason of failure HERE")
- @include "ord" OR @load "ordchr" https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Ord.html
- ord(STRING) -> NUMBER
- chr(NUMBER) -> STRING
codebases
url | |
---|---|
graphics demo | https://github.com/patsie75/awk-demo |
graphics libs | https://github.com/patsie75/awk-glib |
CHIP-8 | https://github.com/patsie75/awk-chip8 |
system logs parsing | https://github.com/kaworu/hawk |
game tetris | https://github.com/mikkun/AWKTC |
git | https://github.com/djanderson/aho |
json | https://github.com/step-/JSON.awk |
webserver | https://github.com/crossbowerbt/awk-webserver |
static site gen | https://github.com/nuex/zodiac |
svg from git | https://github.com/deuill/grawkit |
jvm | https://github.com/rethab/awk-jvm |
toy lang compiler | https://cowlark.com/mercat/com.awk.txt |
plot.awk (to svg) | https://gist.github.com/katef/fb4cb6d47decd8052bd0e8d88c03a102 |
gemini client | http://git.vgx.fr/gem.awk/file/gem.awk.html |
gopher client | https://git.sr.ht/~akarle/gc/tree/main/item/gc |
libs | https://github.com/e36freak/awk-libs |
libs | https://github.com/dubiousjim/awkenough |
exercises | https://github.com/exercism/awk |
snippets
wEiRd - removes leading space
$ awk '{ $1=$1 }1' file.txt $ awk '{ $1=$1 }; { print }' file.txt $ awk '/.*/ { $1=$1 }; /.*/ { print $0 }' file.txt
array
function format_matrix( arr, row, col, res) { for (row in arr) { for (col in arr[row]) res = res sprintf(arr[row][col]) res = res sprintf("\n") } return res } function print_matrix_dimensions( arr) { printf "%dx%d\n", length(arr), length(arr[1]) }
math
function max( x,y) { return (x>y)?x:y } function min( x,y) { return (x<y)?x:y } function abs( x) { return (x<0)?-x:x }
untestes stack?
function isEmpty() { return idx == 0 } function peek() { return stack[idx] } function push(el) { print el; stack[++idx] = el } function pop( tmp) { tmp = stack[idx]; delete stack[idx--]; return tmp }
tested stack?
function push(a, x) { "" in a # coerce into array a[length(a) + 1] = x } function pop(a, __x, __i) { __x = a[1] for (__i = 1; __i < length(a); __i++) a[__i] = a[__i + 1] delete a[__i] return __x }
PGM - grayscale 1-D array of a 2-D matrix
function array2PGM(arr, out) { out = out "P2" # format id out = out NF" "NR # dimensions out = out 9 # max value for (idx in cache) out = out arr[idx] " " return out "\n" }
implementations
$ readelf -d /usr/bin/gawk | grep Shared # 689K 0x0000000000000001 (NEEDED) Shared library: [libsigsegv.so.2] 0x0000000000000001 (NEEDED) Shared library: [libreadline.so.8] 0x0000000000000001 (NEEDED) Shared library: [libmpfr.so.6] 0x0000000000000001 (NEEDED) Shared library: [libgmp.so.10] 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] $ readelf -d /usr/bin/mawk | grep Shared # 155K 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
- buffering
gawk
unbuffered by defaultmawk
buffers by default, needs-W interactive
to disable