Python Forum
Location Named Entity Recognition Problem - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Location Named Entity Recognition Problem (/thread-2500.html)



Location Named Entity Recognition Problem - owais - Mar-22-2017

I'm dealing with Twitter data ,I have users in json format,I'm trying to extract location from these fields,here is some sample data

Sample data:
"location": "Georgia, USA",

"location": "El Centro, CA",

"location": "Barnaul",
"location": "heaven on earth",

The Problem:
The text in location field is not in a consistent format, it's not following any standard, for example, there are ISO codes for countries by using that, one can easily separate city, country or state, but there is no clear indication as to how to identify the text in the field as a particular location.
 
For example the texts in the location field are of these patterns
 
1) Country (ex. Canada)
This is a country but can be anything else, it's just a text, one can match that text with a list of countries, but what if it’s a city.
 
2) City (ex. Toronto)
Or it can be a city
 
3) City, Country (ex. Toronto, Canada)
City and country separated with comma or space
 
4) City, State (ex. Toronto, Ontario)
City and State separated with comma or space
 
5) Meaningless text (ex. Worldwide)
Text which is not a city, country or state
 
6) Different Language (ex 广州)
Same patterns as listed above but in a language other than English, for example, Chinese.
 
7) Abbreviations and ISO codes
  • Sometimes Countries are represented in ISO codes such as CA or CAN for Canada,
  • States as FL for Florida (U.S state),
  • City as US-MN for Minneapolis (a city in Minnesota).
Kindly guide me as to how to solve this problem,there are many libraries to choose from.


RE: Location Named Entity Recognition Problem - Larz60+ - Mar-22-2017

Where is this file located?

What you've presented is not of any value.

If there is no structure to the data, how do you intend to make order out of chaos?

I believe that if someone went to the effort of creating a json file, that there must be structure to the data.


RE: Location Named Entity Recognition Problem - owais - Mar-22-2017

(Mar-22-2017, 05:44 AM)Larz60+ Wrote: Where is this file located? What you've presented is not of any value. If there is no structure to the data, how do you intend to make order out of chaos? I believe that if someone went to the effort of creating a json file, that there must be structure to the data.

Here is the link of twitter followers dump i have collected (it has 3000 followers)
You can use this execllent tool to view the file in a tree view
http://jsonviewer.stack.hu/
There is a "Text" -> "load json data" option which can load json from url.

In that json file I'm interested in location property(field/variable) for now,that was the data I was referring to in the question.


RE: Location Named Entity Recognition Problem - Larz60+ - Mar-22-2017

That's just a json viewer. Where is the data file?
Quote:In that json file I'm interested in location property(field/variable) for now
Where is that json file??


RE: Location Named Entity Recognition Problem - owais - Mar-22-2017

(Mar-22-2017, 01:05 PM)Larz60+ Wrote: That's just a json viewer. Where is the data file?
Quote: In that json file I'm interested in location property(field/variable) for now
Where is that json file??

Sorry my bad
https://gist.githubusercontent.com/OwaisQureshi/f316440bd49eaaef0e6dfa0b197a8371/raw/5042f34ba2cef5fec67fea22834070dedc4dffdb/donalTrump3000Followers.json


RE: Location Named Entity Recognition Problem - Larz60+ - Mar-22-2017

Let me examine the file and see what I can figure out.


RE: Location Named Entity Recognition Problem - Larz60+ - Mar-22-2017

This looks like a standard json file.
so you should be able to read it in with json.load
then you will have a list of dictionary entries.
to get the individual elements, you can use for key, value in (json data structure).items():


RE: Location Named Entity Recognition Problem - lukecage - Aug-08-2017

This tool might help: JSON Formatter