BASH: Count identical lines

Question

I have a file that contains:

VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoiceMailConfig60CharsTest
VoicemailDefaultTypeTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest

How do I replace the duplicate lines with counts:

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

I placing the pair into an associative array. I tried using 'read' inside a 'while' statement, but the array gets lost. Here's my attempt:

unset line
tests=$(cat file.log)
echo "$tests" | 
    while read l; do 
        if [ "$l" == "${line}" ]; then
            let cnt++;
        else
            echo "${line} (${cnt})"
            line=${l}
            cnt=1
        fi
        export run_suites
    done

You're WAY off. See unix.stackexchange.com/questions/169716/… and google UUOC. Also never use the letter l as a variable name as it looks far to much like the number 1 and so obfuscates your code. — Ed Morton
– Ed Morton, Commented Oct 31, 2017 at 18:50
it's pretty rude not to select an answer, please do so or state a reason the answers are not good enough — emilBeBri
– emilBeBri, Commented Sep 1, 2022 at 16:35

MattT · Accepted Answer · 2017-10-31 18:44:30Z

12

Assuming the formatting of the output doesn't exactly have to match

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

you can just use

sort <input_file> | uniq -c

If you need the output to exactly match what you posted, you can use

awk '{duplicates[$1]++} END{for (ind in duplicates) {print ind,"("duplicates[ind]")"}}' <input_file>

Edit: Posted just after anubhava's answer... but leaving (unless people suggest I delete) because of the addition of the sort command.

answered Oct 31, 2017 at 18:44

MattT

1718 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

chepner Over a year ago

I'd leave it; I had the same thought about my own answer, you having beat me by 12 seconds.

chepner · Accepted Answer · 2017-10-31 18:44:42Z

If you don't care about that exact output format, just use sort and uniq:

$ sort file.log | uniq -c
5 VoicemailButtonTest
1 VoiceMailConfig60CharsTest
1 VoicemailDefaultTypeTest
5 VoiceMailIconSelectableTest
2 VoicemailSettingsFromMessageModeScreenTest
7 VoicemailSettingsTest

sort, of course, is unnecessary if the file is already sorted as in your question. If it isn't sorted, uniq -c will still work, but it only considers a line to be a duplicate if it is identical to the immediately preceding line:

$ printf 'a\nb\na' | uniq -c
1 a
1 b
1 a

anubhava · Accepted Answer · 2017-10-31 19:07:00Z

You can use this simple awk script to get counts:

awk '{freq[$1]++} END{for (i in freq) print i, "(" freq[i] ")"}' file

VoiceMailConfig60CharsTest (1)
VoicemailSettingsFromMessageModeScreenTest (2)
VoiceMailIconSelectableTest (5)
VoicemailButtonTest (5)
VoicemailDefaultTypeTest (1)
VoicemailSettingsTest (7)

If you want to maintain the order of appearance in input then use:

awk '!freq[$1]++{order[++k]=$1} END{
    for (i=1; i<=k; i++) print order[i], "(" freq[order[i]] ")"}' file

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

Thanks for the good tip Ed. I forgot it is a builtin function in gnu-awk

karakfa · Accepted Answer · 2017-10-31 19:36:44Z

4

without awk keeping the order of the keys based on first appearance and doesn't require sorted or grouped input.

cat -n file    |     # add line numbers for order
sort -k2       |     # sort based on keys, ignoring line no
uniq -f1 -c    |     # count keys, ignoring line no
sort -k2,2n    |     # sort by line no to recover initial order
sed -r 's/(\S+)\s+(\S+)\s+(\S+)/\3 (\1)/'     # format output

answered Oct 31, 2017 at 19:36

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Comments

Ed Morton · Accepted Answer · 2017-10-31 18:58:24Z

$ awk '$1 != prev{if (NR>1) print prev, "("cnt")"; prev=$1; cnt=0} {cnt++} END{print prev, "("cnt")"}' file
VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

The above retains your input order and stores almost nothing in memory, it doesn't care if your input is sorted or not, it just relies on all duplicate keys occurring contiguously in your input file like you showed in your example.

ctac_ · Accepted Answer · 2017-10-31 23:45:32Z

0

With bash array

unset tab
declare -A tab
while read line;do
  let tab["$line"]=${tab["$line"]}+1
done < infile
for i in ${!tab[*]} ;do
  echo "$i  (${tab[$i]})"
done | sort

answered Oct 31, 2017 at 23:45

ctac_

2,5012 gold badges10 silver badges18 bronze badges

Collectives™ on Stack Overflow

BASH: Count identical lines

6 Answers 6

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Related