Text Processing

Process text files

Count lines

To count the number of lines in a file, use wc -l:

        $ wc -l dogs.txt

Counts the number of lines in dogs.txt

This is very useful when combined with other commands. For example, you can combine it with grep to count the number of lines matching a specific pattern:

        $ grep 'teddy' dogs.txt | wc -l

Counts the number of dogs named teddy

Another thing you can do is to count directory entries with ls | wc -l.

Sort

Sorting output can be done with the sort command:

        $ sort dogs.txt

Sorts dogs.txt alphabetically

To sort numbers, use the -n flag:

        $ sort -n numbers.txt

Sorts the numbers in ascending order

You can always use -r to reverse the order of the sorting.

It's also possible to sort human-readable values like GB, MB, etc with -h:

        $ du -h . | sort -h

Sorts the files in the current directory based on their file size.

You can learn more about the du command in the disk space guide.

Extract columns

cut is used to extract specific columns from each line in a file. It is almost always used in combination with the -d and -f flags. -d specifies the delimiter to use to split a line into columns and -f specifies the columns to extract.

This is very useful if you're working with structured data. For example, if you have a CSV (Comma-Separated Values) file:

        $ cut -d ',' -f 2 dogs.csv

Extracts the second column from each record in dogs.csv

The -f flag also supports ranges of columns:

        $ cut -d ',' -f 2- dogs.csv

Extracts from the second column to the end

        $ cut -d ',' -f -3 dogs.csv

Extracts columns 1, 2, and 3

        $ cut -d ',' -f 2-4 dogs.csv

Extracts columns 2, 3, and 4