Python Forum
Identifying items in a csv file that also appear in a Text extract
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Identifying items in a csv file that also appear in a Text extract
#1
I am new to Python.  I have a text extract from a database and a csv wikipedia list of all countries , and I would like to check if the country is mentioned in the text and the number of times that it is mentioned. This is what I have done so far:


<code>
text = pd.read_sql(select_string, con)


#clean up
text = text.replace({'\n': ' '}, regex=True)
text = text.replace({'-': ' '}, regex=True)

text = text['ProductText']

print(text) #making sure it looks ok


country_codes = pd.read_csv('country-codes.csv')

codes = country_codes['English short name lower case']

count_occurrences=Counter(country for country in text if country in codes)
    
print(count_occurrences)
The problem is that the last piece of code is not picking up any countries at all so the output is Counter()

I suspect that the problem is with the loop but I am not sure how to fix it - any help would really be appreciated :)
Reply
#2
What does Counter() look like?  What does country_codes, and more specifically, codes, look like?
Also, you should probably rename some of your variables to... match what they actually are.  Like this:
count_occurrences = Counter(word for word in text if word in codes)
Since we're assuming that not *every* word in the article is a country code.
Also, are you putting the whole text into lower/uppercase anywhere?  If you're looking for "can", it wouldn't match "CAN" at all, for those poor Canadians :(
Reply
#3
from collections import Counter

counter = Counter(iterable)

print(counter['item'])
How does this syntax highlighting works?  :huh:
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#4
Counter() just returns Counter() in the console when the script is run. country_codes is the names of the csv file which is read into the script and codes is just a variable that I used to assign the relevant column in the csv file - country_codes['English short name lower case']

I am not matching on the Alpha-2 or Alpha-3 columns in the csv file which uses the 3 letter representation of the country like "CAN" XD
Reply
#5
(Sep-21-2016, 09:05 PM)wavic Wrote:
from collections import Counter

counter = Counter(iterable)

print(counter['item'])
How does this syntax highlighting works?  :huh:

Use the python syntax highlighter, not the generic code one.  (they're still working out the plugins)
Also, wouldn't your code just always give "0"?
>>> from collections import Counter
>>> cnt = Counter('Green eggs and spam')
>>> cnt['g']
2
>>> cnt['gg']
0
>>> cnt['eggs']
0
Reply
#6
You may have to supply a bit more code, such as where and what have you defined "Counter". If you are getting an error, please include the Traceback. Never mind, still getting used to new forum  :angel:  :P
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#7
(Sep-21-2016, 09:05 PM)wavic Wrote:
from collections import Counter

counter = Counter(iterable)

print(counter['item'])
How does this syntax highlighting works?  :huh:
Use the Python icon sceditor,have upgraded name and color a little ;)
Reply
#8
(Sep-21-2016, 09:12 PM)sparkz_alot Wrote: You may have to supply a bit more code, such as where and what have you defined "Counter". If you are getting an error, please include the Traceback. Never mind, still getting used to new forum  :angel:  :P

from collections import Counter
Reply
#9
(Sep-21-2016, 09:16 PM)Jaynorth Wrote:
(Sep-21-2016, 09:12 PM)sparkz_alot Wrote: You may have to supply a bit more code, such as where and what have you defined "Counter". If you are getting an error, please include the Traceback. Never mind, still getting used to new forum  :angel:  :P

from collections import Counter

Thanks  :)
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#10
Counter(country for country in text if country in codes)
(Sep-21-2016, 09:12 PM)nilamo Wrote:
(Sep-21-2016, 09:05 PM)wavic Wrote:
from collections import Counter

counter = Counter(iterable)

print(counter['item'])
How does this syntax highlighting works?  :huh:

Use the python syntax highlighter, not the generic code one.  (they're still working out the plugins)
Also, wouldn't your code just always give "0"?
>>> from collections import Counter
>>> cnt = Counter('Green eggs and spam')
>>> cnt['g']
2
>>> cnt['gg']
0
>>> cnt['eggs']
0

My interpretation of this is that : count every word for word that is in the text if the word is also in the country code csv file - is this correct?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Cleaning a dataset: How to extract text between two patterns Palke 0 1,155 Mar-06-2023, 05:13 PM
Last Post: Palke
  extract and plot data from a txt file usercat123 2 1,232 Apr-20-2022, 06:50 PM
Last Post: usercat123
  [machine learning] identifying a number 0-9 from a 28x28 picture, not working SheeppOSU 0 1,847 Apr-09-2021, 12:38 AM
Last Post: SheeppOSU
  Comparing and Identifying ID with Percentage jonatasflausino 1 2,444 Jun-23-2020, 06:44 PM
Last Post: hussainmujtaba
  Identifying consecutive masked values in a 3D data array chai0404 12 5,774 Feb-01-2020, 12:59 PM
Last Post: perfringo
  Validate Excel with text in text file Vinci141 3 3,441 Dec-03-2018, 04:03 PM
Last Post: Larz60+
  OpenCV - extract 1st frame out of a video file kerzol81 2 22,034 Nov-12-2018, 09:12 AM
Last Post: kerzol81
  Upload csv file as numbers (floating?) and extract element, row, and column bentaz 7 4,500 Mar-19-2018, 05:34 PM
Last Post: bentaz
  Extract data between two dates from a .csv file using Python 2.7 sujai_banerji 1 10,380 Nov-15-2017, 09:48 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020