Sep-21-2016, 09:26 PM
(Sep-21-2016, 09:21 PM)nilamo Wrote:I used Pandas to extract the text so it is a dataframe and not a string so I cannot use .split() or can I?(Sep-21-2016, 09:11 PM)Jaynorth Wrote: Counter() just returns Counter() in the console when the script is run. country_codes is the names of the csv file which is read into the script and codes is just a variable that I used to assign the relevant column in the csv file - country_codes['English short name lower case']
I am not matching on the Alpha-2 or Alpha-3 columns in the csv file which uses the 3 letter representation of the country like "CAN" XD
But, why use any sort of Counter() function at all? len() would do the exact same thing, wouldn't it?
>>> text = ''' ... Once upon a time, there was the great country of Mexico. Then there... blah blah blah''' >>> [word for word in text.split()] ['Once', 'upon', 'a', 'time,', 'there', 'was', 'the', 'great', 'country', 'of', 'Mexico.', 'Then', 'there...', 'blah', 'blah', 'blah'] >>> import re >>> [word for word in text.split() if re.sub(r'\W', '', word) in codes] ['Mexico.'] >>> len([word for word in text.split() if re.sub(r'\W', '', word) in codes]) 1