Posts: 3,458
Threads: 101
Joined: Sep 2016
(Sep-21-2016, 09:11 PM)Jaynorth Wrote: Counter() just returns Counter() in the console when the script is run. country_codes is the names of the csv file which is read into the script and codes is just a variable that I used to assign the relevant column in the csv file - country_codes['English short name lower case']
I am not matching on the Alpha-2 or Alpha-3 columns in the csv file which uses the 3 letter representation of the country like "CAN" XD
But, why use any sort of Counter() function at all? len() would do the exact same thing, wouldn't it?
>>> text = '''
... Once upon a time, there was the great country of Mexico. Then there... blah blah blah'''
>>> [word for word in text.split()]
['Once', 'upon', 'a', 'time,', 'there', 'was', 'the', 'great', 'country', 'of', 'Mexico.', 'Then', 'there...', 'blah', 'blah', 'blah']
>>> import re
>>> [word for word in text.split() if re.sub(r'\W', '', word) in codes]
['Mexico.']
>>> len([word for word in text.split() if re.sub(r'\W', '', word) in codes])
1
Posts: 7
Threads: 1
Joined: Sep 2016
(Sep-21-2016, 09:21 PM)nilamo Wrote: (Sep-21-2016, 09:11 PM)Jaynorth Wrote: Counter() just returns Counter() in the console when the script is run. country_codes is the names of the csv file which is read into the script and codes is just a variable that I used to assign the relevant column in the csv file - country_codes['English short name lower case']
I am not matching on the Alpha-2 or Alpha-3 columns in the csv file which uses the 3 letter representation of the country like "CAN" XD
But, why use any sort of Counter() function at all? len() would do the exact same thing, wouldn't it?
>>> text = '''
... Once upon a time, there was the great country of Mexico. Then there... blah blah blah'''
>>> [word for word in text.split()]
['Once', 'upon', 'a', 'time,', 'there', 'was', 'the', 'great', 'country', 'of', 'Mexico.', 'Then', 'there...', 'blah', 'blah', 'blah']
>>> import re
>>> [word for word in text.split() if re.sub(r'\W', '', word) in codes]
['Mexico.']
>>> len([word for word in text.split() if re.sub(r'\W', '', word) in codes])
1 I used Pandas to extract the text so it is a dataframe and not a string so I cannot use .split() or can I?
Posts: 2,953
Threads: 48
Joined: Sep 2016
(Sep-21-2016, 09:12 PM)nilamo Wrote: (Sep-21-2016, 09:05 PM)wavic Wrote: from collections import Counter
counter = Counter(iterable)
print(counter['item']) How does this syntax highlighting works? :huh:
Use the python syntax highlighter, not the generic code one. (they're still working out the plugins)
Also, wouldn't your code just always give "0"?
>>> from collections import Counter
>>> cnt = Counter('Green eggs and spam')
>>> cnt['g']
2
>>> cnt['gg']
0
>>> cnt['eggs']
0
It gets the iterable from a CSV file. So if a row is "one,two,three,one,two,three" counter.keys() will return ['one', 'two', three'] as it suppose to be. CSV module will split it to the list.
Posts: 7,315
Threads: 123
Joined: Sep 2016
Sep-21-2016, 09:38 PM
(This post was last modified: Sep-21-2016, 09:48 PM by snippsat.)
Quote:But, why use any sort of Counter() function at all? len() would do the exact same thing, wouldn't it?
Because Counter is a better solution's and more readable than a list comprehension with a regex inside.
Quote:I used Pandas to extract the text so it is a dataframe and not a string so I cannot use .split() or can I?
You can use all Python syntax with Pandas.
Split if you want whole word.
>>> from collections import Counter
>>> cnt = Counter('Green eggs and spam spam spam spam'.split())
>>> print(cnt)
Counter({'spam': 4, 'eggs': 1, 'Green': 1, 'and': 1})
>>> print(cnt.most_common(1))
[('spam', 4)]
Posts: 2,953
Threads: 48
Joined: Sep 2016
>>>import string
>>>[word.strip(string.punctuation) for word in text.split()]
Posts: 7
Threads: 1
Joined: Sep 2016
(Sep-21-2016, 09:45 PM)wavic Wrote: >>>import string
>>>[word.strip(string.punctuation) for word in text.split()]
This gives the following error: AttributeError: 'Series' object has no attribute 'split'
Posts: 2,953
Threads: 48
Joined: Sep 2016
(Sep-21-2016, 10:11 PM)Jaynorth Wrote: (Sep-21-2016, 09:45 PM)wavic Wrote: >>>import string
>>>[word.strip(string.punctuation) for word in text.split()]
This gives the following error: AttributeError: 'Series' object has no attribute 'split'
Hmm! It just splits the regular text and remove the punctuation. So you get only the words. No Pandas here
In [1]: import string
In [2]: text = "This gives the following error: AttributeError: 'Series' object
...: has no attribute 'split'"
In [3]: [word.strip(string.punctuation) for word in text.split()]
Out[3]:
['This',
'gives',
'the',
'following',
'error',
'AttributeError',
'Series',
'object',
'has',
'no',
'attribute',
'split']
In [4]:
Posts: 7
Threads: 1
Joined: Sep 2016
(Sep-21-2016, 10:32 PM)wavic Wrote: (Sep-21-2016, 10:11 PM)Jaynorth Wrote: (Sep-21-2016, 09:45 PM)wavic Wrote: >>>import string
>>>[word.strip(string.punctuation) for word in text.split()]
This gives the following error: AttributeError: 'Series' object has no attribute 'split'
Hmm! It just splits the regular text and remove the punctuation. So you get only the words. No Pandas here
In [1]: import string
In [2]: text = "This gives the following error: AttributeError: 'Series' object
...: has no attribute 'split'"
In [3]: [word.strip(string.punctuation) for word in text.split()]
Out[3]:
['This',
'gives',
'the',
'following',
'error',
'AttributeError',
'Series',
'object',
'has',
'no',
'attribute',
'split']
In [4]: I just checked the Pandas documentation: to split a Pandas series use- text.str.split() and this converts it to an object but now I get a TypeError: unhashable type: list
For the codes variable with the country_codes csv because it is a list
|