How I Remove Duplicate Lines From a File With awk

One of the repositories I maintain is a beginner’s GitHub repo. New developers can make their first pull request by adding their GitHub handle to a simple text file.

When pull requests get merged into the master branch, they often contain duplicates. The file has more than 7,000 lines. Names are not sorted alphabetically.

I needed a simple way to remove all duplicates lines from the file without sorting the lines.

I’m using awk, a Unix shell program. I’m not proficient in using awk, but I’ve found useful one-liners that do what I want.

For reference, this is how my file should look like:

# CONTRIBUTORS

- [@RupamG](https://github.com/RupamG)

- [@hariharen9](https://github.com/hariharen9)

- [@clevermiraz](https://github.com/clevermiraz)

- [@smeubank](https://github.com/smeubank)

- [@LJones95](https://github.com/LJones95)

- [@shannon-nz](https://github.com/shannon-nz)

- [@sammiepls](https://github.com/sammiepls)

Here’s how it often looks like:

# CONTRIBUTORS

- [@RupamG](https://github.com/RupamG)

- [@hariharen9](https://github.com/hariharen9)

- [@clevermiraz](https://github.com/clevermiraz)

- [@smeubank](https://github.com/smeubank)

- [@LJones95](https://github.com/LJones95)
- [@hariharen9](https://github.com/hariharen9)

- [@shannon-nz](https://github.com/shannon-nz)

- [@sammiepls](https://github.com/sammiepls)
- [@shannon-nz](https://github.com/shannon-nz)

1. Remove all empty lines

awk 'NF > 0' file.txt

NF is the Number of Fields Variable.

2. Remove duplicates

awk '!seen[$0]++' file.txt

I stole this command from opensource.com, where you can find an explanation on how it works.

3. Add Empty Lines Again

awk '{print; print "";}' file.txt

See Stackexchange.

How I Remove Duplicate Lines From a File With awk

1. Remove all empty lines

2. Remove duplicates

3. Add Empty Lines Again

Links