Linux provides a wide array of text processing tools that are essential for handling and manipulating text files. These tools are frequently used in scripting, data processing, and system administration tasks.
This page will cover several common text processing tools including awk
, sed
, head
, tail
, wc
, cut
, and paste
.
awk
is a powerful text processing tool used for pattern scanning and processing. It operates on rows of text, making it ideal for working with structured data.
-
Print Specific Columns:
awk '{print $1, $3}' filename
This command prints the first and third columns from each line of
filename
. -
Filter and Process Data:
awk '$3 > 50 {print $1, $3}' filename
This command prints the first and third columns only for rows where the third column is greater than 50.
-
Field Separator:
awk -F ',' '{print $1, $2}' filename
This uses a comma as the field separator (useful for CSV files).
While sed
is also covered under filters, it’s one of the most versatile text processing tools, making it worthy of a deeper look.
-
Substitute Text:
sed 's/oldword/newword/' filename
This substitutes the first occurrence of
oldword
withnewword
in each line. -
Delete Lines:
sed '/pattern/d' filename
This deletes lines that match the pattern.
-
Insert Text:
sed '2i\This is a new line' filename
This inserts a new line of text after the second line.
The head
and tail
commands are used to display the beginning or end of files, respectively.
-
Display the First 10 Lines:
head filename
-
Display the First N Lines:
head -n 20 filename
This displays the first 20 lines of
filename
.
-
Display the Last 10 Lines:
tail filename
-
Display the Last N Lines:
tail -n 15 filename
This displays the last 15 lines of
filename
. -
Monitor File Changes:
tail -f filename
This keeps the last few lines visible and updates them as the file changes, useful for monitoring log files.
The wc
command counts lines, words, and characters in files, making it a handy tool for text analysis.
-
Count Lines:
wc -l filename
-
Count Words:
wc -w filename
-
Count Characters:
wc -m filename
-
Full Count:
wc filename
This provides the line, word, and character counts for the file.
cut
is used to remove or "cut out" sections from each line of text, often based on a delimiter.
-
Extract Specific Fields:
cut -d ',' -f 2,4 filename
This extracts the second and fourth fields (columns) from a CSV file.
-
Cut by Character Position:
cut -c 1-5 filename
This extracts the first five characters of each line.
The paste
command merges corresponding lines from multiple files.
-
Merge Two Files Line by Line:
paste file1 file2
This outputs the contents of
file1
andfile2
side by side, separated by a tab. -
Specify a Delimiter:
paste -d ',' file1 file2
This merges the files with a comma as the delimiter.
These common text processing tools are fundamental to mastering Linux command-line operations. By leveraging these tools, you can efficiently manipulate text data, automate tasks, and enhance your productivity in a Linux environment. Mastering these commands will significantly improve your ability to handle a wide range of text processing tasks.
Next: Redirections
Previous: Filters