Python Code to Grab Two Letter State Codes
I needed to have just a list of two letter state codes for a Collibra Data Quality custom rule I was creating for a customer. I found a states.csv file online that had two columns, State Name and Code (Abbreviation). I didn't need the header, so I deleted the first line. I now had this:
states.csv file content
Then I write some Python code to read the file and write out just the list of codes, like this:
# Input: ~/Downloads/states.csv
import csv
path ='/Users/<Your_User_Name>'
with open(path + '/Downloads/states.csv', 'r') as file:
csv_reader = csv.reader(file)
with open(path + '/Downloads/state_codes.txt', 'a') as file:
for row in csv_reader:
code = '\'' + row[1] + '\','
file.write(code)
When I went to look for my results, I found this file as expected state_codes.txt:
'AL','AK','AZ','AR','CA','CO','CT','DE','DC','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','MD','MA','MI','MN','MS','MO','PA','RI','SC','SD','TN','TX','UT','VT','VA','WA','WV','WI','WY',
I copied that into my DQ custom rule, omitting the final comma :)
Voila!
For more information:
Laurent Weichberger (Big Data Bear)
Principal Customer Engineer
laurent (dot) weichberger (at) collibra (dot) com
Another option. Since it looks like you downloaded a file. Simply add the file to CDQ and have the rule be “where state_cd IN (states from state_codes.txt)