This guide shows you how to use the awk command in Linux, with plenty of useful everyday examples.
AWK is a tool and language for searching and manipulating text available for the Linux Operating System.
The awk command and the associated scripting language search files for text defined by a pattern and perform a specific action on the text which matches the pattern.
awk is a useful tool for extracting data and building reports from large text files or large numbers of text files – for example processing logs, or the output of data-recording devices like temperature probes, which have collected a lot of data over a period of time. It can also be used on the output from database queries.
There’s no need to install awk; it should already be available on your Linux System.
awk Syntax
The syntax for using the awk command in the terminal is as follows:
awk [PROGRAM] [INPUT FILES]
Note that:
- [PROGRAM] is the search pattern and actions to take – it’s the program you want awk to run on the supplied files
- It can also be supplied as a text file rather than inline by using the -f option
- [INPUT FILES] are the files you wish awk to work on – it can be several files separated by a space, or the path to a directory, or a pattern of files to match
- If no input files are specified, awk will work on the piped output from another command
awk Options
The following options can be supplied to the awk command:
-f program-file | Program text is read from file instead of from the command-line. Multiple -f options are accepted. |
-F value | Sets the field separator, FS, to value. |
-v var=value | Assigns value to program variable var. |
For more implementation-specific options for your version of awk, you can check the manual by running:
man awk
Program Actions & Variables
The program you supply to awk will determine what it does to the text files you supply to it. An awk program takes the following format:
CONDITION { ACTION } CONDITION { ACTION } ...
Where CONDITION is the pattern of text to match and ACTION is the action to take on the matched text. You can have as many conditions and actions as you please.
Actions
The actions supplied are commands that can include calculations, variables, and calling functions. Some built-in functions are implementation-specific, so it’s best to check your manual for these.
Records
awk generally treats each new line in a text file as a record unless otherwise specified via OPTIONS.
Fields
awk will use whitespace (spaces, tabs) to denote the fields in a record unless otherwise specified via OPTIONS.
Variables
awk has many built-in variables that you can use without having to define them yourself, which cover some common scenarios:
Variable | Meaning |
---|---|
$0 | Represents the entire record |
$1, $2, $3 … | Field variables – hold the text/values for the individual text fields in a record |
NR / Number of Records | Current count of the number of input records read so far from all files |
FNR / File Number of Records | Current count of the number of input records read so far in the current file – Automatically reset to zero each time a new file is started |
NF / Number of Fields | Number of fields in the current input record – The last field in a record can be referenced using $NF, the 2nd to last field using $(NF-1) and so on |
FILENAME | Name of the current input file |
FS / Field Separator | The character(s) used to separate the fields in a record. By default includes any space and tab characters |
RS / Record Separator | The character(s) used to separate the records in a file. New line by default |
OFS / Output Field Separator | Character(s) used to separate fields in Awk output. The default is a single space |
ORS / Output Record Separator | Character(s) used to separate fields in Awk output. The default is a new line |
OFMT / Output ForMaT | Format for numeric output – Default format is “%.6g” |
awk Usage Examples
For these examples, we will work on a single text file called flowers.txt, which contains the following text:
red rose yellow daffodil pink flamingo white rose blue iris white lily red peony yellow orchid purple foxglove
Print File Contents
The following awk command will output the contents of a file to the terminal using the awk print function:
awk '{print}' flowers.txt
Print Number of Records (Lines) in File
awk 'END { print NR }' sample.txt
This example will output the number of lines in the file:
9
Search for Text in File Using Regular Expressions
The following command will output the lines in a file describing only types of rose:
awk '/rose/' flowers.txt
Note that REGEX (Regular Expressions) syntax is used to define the text to search.
This command will output:
red rose white rose
More Regex
awk '/^p/' flowers.txt
This command will only output records starting with p:
pink flamingo purple foxglove
Using Field Variables
By using field Variables, you can output only the first field for records starting with p:
awk '/^p/ {print $1;}' flowers.txt
Which will output:
pink purple
Processing Output from other Programs
You can pipe output from other Linux shell programs into awk for processing. This example takes the output from the ls -l command, which lists the contents of the current directory and returns the contents of the 5th field (the size of the file):
ls -l | awk '{print $5}'
Which will output something like:
3104 3072 224 256
…(depending on how many files are in the current directory and how big they are).
Using Built-In Variables
awk '{print NR "-" $2 }' flowers.txt
This command will print the current record number (file line number) followed by the second field – the name of the flower:
1-red rose 2-yellow daffodil 3-pink flamingo 4-white rose 5-blue iris 6-white lily 7-red peony 8-yellow orchid 9-purple foxglove
Combining Actions
Conditions and actions can be combined using && This command will print all records where the first field contains the text red and the second field has less than 5 characters:
awk '$1 ~ /red/ && length($NF) < 5 { print }' flowers.txt
Note:
- The use of $NF to get to the second field as an alternative to using $2 – possible as it’s the last field and thus equal to the NF (Number of Fields)
- The length() function is used to calculate the length of the field
So it returns a single matching record from the example file:
red rose
Conclusion
awk is included pretty much universally with Linux for a reason – it’s a staple tool for searching and processing text, which you can use for quickly finding log entries if something goes wrong on your system or for processing captured data for research use.
If you’ve ever tried to do anything more than a simple find/replace on a large collection of text files, you’ll know the value in being able to specifically make replacements or updates to all of your text programmatically, without having to run individual find/replace commands.