I have a input data like:
chr17 41243232 41243373 BRCA1_ex11
chr17 41243232 41243373 BRCA1_ex12
chr17 41243471 41243644 BRCA1_ex11
chr17 41243639 41243811 BRCA1_ex11
chr13 32954112 32954208 BRCA2_ex23
chr13 32954112 32954208 BRCA2_ex24
And I need to check for duplicates rows $2 and $3 row, if is duplicated, I need to merged into one line and $4 column print as comma separated.
Output:
chr17 41243232 41243373 BRCA1_ex11,BRCA1_ex12
chr17 41243471 41243644 BRCA1_ex11
chr17 41243639 41243811 BRCA1_ex11
chr13 32954112 32954208 BRCA2_ex23,BRCA2_ex24
Is there any AWK solution to easy process this kind a data? I would appreciate explained solution. Input and output are tab-separated formats. NOTE: First, second and third fields are allays equal.
My try was:
awk -v OFS="\t" '{i=$2 FS $1 FS $3 FS $4} {a[i]=!a[i]?$4:a[i] "," $4} END {for (l in a) {print l,a[l]}}' infile
Thank you for any ideas.