7

I have a file that contains:

VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoiceMailConfig60CharsTest
VoicemailDefaultTypeTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest

How do I replace the duplicate lines with counts:

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

I placing the pair into an associative array. I tried using 'read' inside a 'while' statement, but the array gets lost. Here's my attempt:

unset line
tests=$(cat file.log)
echo "$tests" | 
    while read l; do 
        if [ "$l" == "${line}" ]; then
            let cnt++;
        else
            echo "${line} (${cnt})"
            line=${l}
            cnt=1
        fi
        export run_suites
    done
3
  • 1
    You're WAY off. See unix.stackexchange.com/questions/169716/… and google UUOC. Also never use the letter l as a variable name as it looks far to much like the number 1 and so obfuscates your code. Commented Oct 31, 2017 at 18:50
  • it's pretty rude not to select an answer, please do so or state a reason the answers are not good enough Commented Sep 1, 2022 at 16:35
  • I do not know how to select an answer. Commented Mar 25, 2024 at 19:41

6 Answers 6

12

Assuming the formatting of the output doesn't exactly have to match

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

you can just use

sort <input_file> | uniq -c

If you need the output to exactly match what you posted, you can use

awk '{duplicates[$1]++} END{for (ind in duplicates) {print ind,"("duplicates[ind]")"}}' <input_file>

Edit: Posted just after anubhava's answer... but leaving (unless people suggest I delete) because of the addition of the sort command.

Sign up to request clarification or add additional context in comments.

1 Comment

I'd leave it; I had the same thought about my own answer, you having beat me by 12 seconds.
6

If you don't care about that exact output format, just use sort and uniq:

$ sort file.log | uniq -c
5 VoicemailButtonTest
1 VoiceMailConfig60CharsTest
1 VoicemailDefaultTypeTest
5 VoiceMailIconSelectableTest
2 VoicemailSettingsFromMessageModeScreenTest
7 VoicemailSettingsTest

sort, of course, is unnecessary if the file is already sorted as in your question. If it isn't sorted, uniq -c will still work, but it only considers a line to be a duplicate if it is identical to the immediately preceding line:

$ printf 'a\nb\na' | uniq -c
1 a
1 b
1 a

Comments

4

You can use this simple awk script to get counts:

awk '{freq[$1]++} END{for (i in freq) print i, "(" freq[i] ")"}' file

VoiceMailConfig60CharsTest (1)
VoicemailSettingsFromMessageModeScreenTest (2)
VoiceMailIconSelectableTest (5)
VoicemailButtonTest (5)
VoicemailDefaultTypeTest (1)
VoicemailSettingsTest (7)

If you want to maintain the order of appearance in input then use:

awk '!freq[$1]++{order[++k]=$1} END{
    for (i=1; i<=k; i++) print order[i], "(" freq[order[i]] ")"}' file

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

1 Comment

Thanks for the good tip Ed. I forgot it is a builtin function in gnu-awk
4

without awk keeping the order of the keys based on first appearance and doesn't require sorted or grouped input.

cat -n file    |     # add line numbers for order
sort -k2       |     # sort based on keys, ignoring line no
uniq -f1 -c    |     # count keys, ignoring line no
sort -k2,2n    |     # sort by line no to recover initial order
sed -r 's/(\S+)\s+(\S+)\s+(\S+)/\3 (\1)/'     # format output

Comments

1
$ awk '$1 != prev{if (NR>1) print prev, "("cnt")"; prev=$1; cnt=0} {cnt++} END{print prev, "("cnt")"}' file
VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

The above retains your input order and stores almost nothing in memory, it doesn't care if your input is sorted or not, it just relies on all duplicate keys occurring contiguously in your input file like you showed in your example.

Comments

0

With bash array

unset tab
declare -A tab
while read line;do
  let tab["$line"]=${tab["$line"]}+1
done < infile
for i in ${!tab[*]} ;do
  echo "$i  (${tab[$i]})"
done | sort

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.