Skip to content

Latest commit

 

History

History
195 lines (117 loc) · 4.27 KB

2. Common Text Processing Tools.md

File metadata and controls

195 lines (117 loc) · 4.27 KB

Common Text Processing Tools

Linux provides a wide array of text processing tools that are essential for handling and manipulating text files. These tools are frequently used in scripting, data processing, and system administration tasks.

This page will cover several common text processing tools including awk, sed, head, tail, wc, cut, and paste.

awk: Pattern Scanning and Processing Language

awk is a powerful text processing tool used for pattern scanning and processing. It operates on rows of text, making it ideal for working with structured data.

Basic Usage

  • Print Specific Columns:

    awk '{print $1, $3}' filename

    This command prints the first and third columns from each line of filename.

  • Filter and Process Data:

    awk '$3 > 50 {print $1, $3}' filename

    This command prints the first and third columns only for rows where the third column is greater than 50.

  • Field Separator:

    awk -F ',' '{print $1, $2}' filename

    This uses a comma as the field separator (useful for CSV files).

sed: Stream Editor for Transforming Text

While sed is also covered under filters, it’s one of the most versatile text processing tools, making it worthy of a deeper look.

Basic Usage

  • Substitute Text:

    sed 's/oldword/newword/' filename

    This substitutes the first occurrence of oldword with newword in each line.

  • Delete Lines:

    sed '/pattern/d' filename

    This deletes lines that match the pattern.

  • Insert Text:

    sed '2i\This is a new line' filename

    This inserts a new line of text after the second line.

head and tail: Display the Beginning or End of Files

The head and tail commands are used to display the beginning or end of files, respectively.

head Command

  • Display the First 10 Lines:

    head filename
  • Display the First N Lines:

    head -n 20 filename

    This displays the first 20 lines of filename.

tail Command

  • Display the Last 10 Lines:

    tail filename
  • Display the Last N Lines:

    tail -n 15 filename

    This displays the last 15 lines of filename.

  • Monitor File Changes:

    tail -f filename

    This keeps the last few lines visible and updates them as the file changes, useful for monitoring log files.

wc: Word, Line, and Character Count

The wc command counts lines, words, and characters in files, making it a handy tool for text analysis.

Basic Usage

  • Count Lines:

    wc -l filename
  • Count Words:

    wc -w filename
  • Count Characters:

    wc -m filename
  • Full Count:

    wc filename

    This provides the line, word, and character counts for the file.

cut: Extracting Sections from Lines

cut is used to remove or "cut out" sections from each line of text, often based on a delimiter.

Basic Usage

  • Extract Specific Fields:

    cut -d ',' -f 2,4 filename

    This extracts the second and fourth fields (columns) from a CSV file.

  • Cut by Character Position:

    cut -c 1-5 filename

    This extracts the first five characters of each line.

paste: Merge Lines of Files

The paste command merges corresponding lines from multiple files.

Basic Usage

  • Merge Two Files Line by Line:

    paste file1 file2

    This outputs the contents of file1 and file2 side by side, separated by a tab.

  • Specify a Delimiter:

    paste -d ',' file1 file2

    This merges the files with a comma as the delimiter.

Conclusion

These common text processing tools are fundamental to mastering Linux command-line operations. By leveraging these tools, you can efficiently manipulate text data, automate tasks, and enhance your productivity in a Linux environment. Mastering these commands will significantly improve your ability to handle a wide range of text processing tasks.


Next: Redirections

Previous: Filters