awk.txt

Basic syntax is

\e[32mawk '/\e[1mpattern\e[22m/ { \e[1maction\e[22m }' \e[1mfile\e[0m

awk operates on each line of the file or STDIN, and runs action if pattern matches.
#Pattern:
- if no pattern is given, the action runs.
- pattern can also be a \e[1mconditional expression\e[0m like \e[38;5;208m$2 > 20\e[0m meaning the value in column two must be over 20.
- you can ALSO combine patterns and conditional expressions with && and ||
\e[32mawk '/pattern/ && $1 > 100 { print $2 }' filename\e[0m
#Action:
- action is optional, and 'print' by default. Default argument for print is the whole line. {..} surround the action(s).
- action can contain if-clauses and var assignments etc too
- multiple actions are separated by ';'

So these are all basically the same:
\e[32mawk '{if (length($0) > 50 ) print $0}' file
awk '{if (length($0) > 50 ) print }' file
awk 'length($0) > 50 { print }' file \e[36m# note conditional expression instead of if-clause in the action\e[0m
\e[32mawk 'length($0) > 50' file\e[0m
and will print all lines of file longer than 50 characters.

Multiple pattern/action pairs are possible, each pair will be evaluated for every line.
If a script becomes unwieldy, store it in a file:
foo | awk -f script.awk
While it doesn't look it, awk is a full programming language with function declarations, system calls, 

# Vars
$0 is the whole line, $1, $2 etc are columns/words where the line is split on any amount of whitespace.
NF is number of fields (columns). NR is the line number (R for record).
FS and OFS are field separater, 2nd for output, they're space by default and can be reset \e[32mawk 'BEGIN { FS="," } ...\e[0m
RS and ORS are similar, line/record separators.
There's others for handling files, arguments and environment vars.

# Arrays!
foo[$1]++ \e[36m# will create an assoc array with first words as key and number of occurrences as value.\e[0m
# print how often the words in the second column appeared, and the word itself
awk '{count[$2]++} END {for (c in count) print count[c], c}' data.txt

# Comments
Start with '#' like in bash

# Some flags
-F,  \e[36m# use ',' as Field separator instead of whitespace\e[0m
-f <file> \e[36m# read script from file\e[0m
-v var=value \e[36m# assign value to variable\e[0m

BEGIN and END are special commands that can have their own actions, they run before and after processing all the lines:
# add up all numbers from the first column:
command | awk '{sum += $1} END {print sum}'

# Find out the longest 10 filenames in a directory:

\e[32mls somedir | awk '{print length($0), $0}' | sort -nr | head\e[0m
optionally add \e[32m| cut -d' ' -f2-\e[0m to only print the filenames, or even \e[32m| awk '{print $2}'\e[0m if they don't contain spaces.
printf is similar to print where the first argument can be a format string like \e[32mprintf 'foo: %s', 123\e[0m, it also does not add
  a linebreak by default like print does (use "\\n")

# loops
for (i=1; i<=NF; i++) print stuff
for (key in assocArray) stuff with assocArray[key]
while (a < 10) {
    # multiple lines in a block are surrounded by {..}
    foo
}

# ternary
{ print ($1 > 10 ? ($1 > 30 ? "huge" : "medium") : "small") }

# functions:
tolower(str),
toupper(str),
substr(string, start, length)
index(string, substring)
gsub(regex, replacement, target) \e[36m# global substitute, sub(..) for just first match\e[0m
# number functions
int() rounds down to integer
sqrt() square root
rand() \e[36m# random number between 0 and 1, multiply and int() as needed.\e[0m