Python Forum

Pages: 1 2

Quote:Could you please show - as example - how you would build and fill a dictionary?

when working.

Quote:Thanks a lot for your proposition!

??? didn't know I had one

Just got back from an appointment.

Last night after football, I was able to get the data from the json
files behaving properly. I have started building the dictionaries, and once that is working I'll re-post code
with examples of how to extract the data.

Should be in the next few hours.

I wish that you had posted the entire task at the start.
Since you sent it in a private e-mail, I ignored it until a few minutes ago.
You need to process this corpus with NLTK (nltk can be loaded into your python with 'pip install nltk')
once you do this, you can follow how to load a corpus from a JSON file here
I tried it, but it failed with the error

Error:Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\site-packages\nltk\data.py", line 810, in load
    resource_val = json.load(opened_resource)
  File "C:\Python35\lib\json\__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "C:\Python35\lib\json\__init__.py", line 312, in loads
    s.__class__.__name__))

The command I used was:

>>> import codecs
>>> reader = codecs.getreader("utf-8")
>>> nltk.data.load("https://code.google.com/archive/p/relation-extraction-corpus/downloads/20131104-place_of_death.json")

I also tried it without the codecs, and again with escaped slashes.
same results.

I think that your best bet is to see if you get a byte on your posting in jobs.
Again, just so someone knows what their getting into, I'd post the entire task.
It's not fair to do less.

It no longer makes sense to continue with the dictionary idea

(Dec-14-2016, 01:21 AM)Larz60+ Wrote: [ -> ]I wish that you had posted the entire task at the start.
Since you sent it in a private e-mail, I ignored it until a few minutes ago.

You're a better person than me for doing something based off a PM. I would have just responded "post that in the forum".
If someone else has the same problem/issue, PMs will never help them :p

Quote:I would have just responded "post that in the forum".

Actually, I did, but continued for a while until I realized that I was being taken for a walk

You two guys are right. So, I have here our to do list.
We're struggling on steps 2 and 3. So if anybody could help, we'd be very happy!

PS: Concerning the features - it would help already a lot, if we had a rough structure of the code (so, the features don't have to be too elaborated - we can do this! :) ).

(Dec-19-2016, 12:33 AM)MattaFX Wrote: [ -> ]So, I have here our to do list.
We're struggling on steps 2 and 3.

And that's the list there:

Quote:We will perform the classification for two kinds of relations: place of birth and institution. Your task is to design and extract a set of features, to give these features as input to the logistic regression and to evaluate the performance.
Data and tools:
Both for training and testing, we will use the Google relation extraction corpus (https://code.google.com/archive/p/relati.../downloads). This corpus consists of snippets of text from Wikipedia with annotated relations. You can find more information on this blog (https://research.googleblog.com/2013/04/...ation.html).
The entities in the annotated relations are encoded with IDs. This means that you need to search the Knowledge Graph in order to see what entities are in relation. For querying the database, follow Google Knowlegde graph API Search (https://developers.google.com/knowledge-graph/).
Steps

Preprocessing:

1.1 Find positive and negative examples for the two relations
Each relation in the corpus is annotated by several (up to seven) annotators, who gave different responses. As a result, we are not certain whether a text snippet is a positive or a negative example of a relation. Your first task is to find a way to divide the snippets into positive and negative examples based on the distribution of annotators' responses. Note that the proportion of positive vs. negative examples might influence your results.
1.2 Resolve IDs. Once you resolve the IDs, identify the strings in the text snippet. Note that there could be errors here, which can then propagate to the subsequent steps. If you cannot find the entities in some snippets, you may remove them.
1.3 Prepare a development data set to use for the analysis and developing the set of features. These items must not be used for testing.
1.4 Consider some ideas for the features. Decide whether you need any additional tools (e.g. PoS tagger, parsers) to extract the features. If yes, install them and make sure that you can run them on your data.
2 Feature extraction
2.1 Based on the literature and your own intuition, design a set of features to be used for classification. Describe your features and the intuition about what they are supposed to capture.
2.2 Write a script that extracts the features and prepares the data set for a classifier. Include in this step any formatting required by the classifier.
3 Running a classifier and evaluation
Perform a binary classification using your prepared data set and the Logistic Regression classifier from the Python scikit-learn library. Make sure you understand the output well.
Perform the evaluation as a 10-fold cross-validation. Report the result of each fold. Your final result is the average of the 10 folds.

What has it got to do with Python?

(Dec-19-2016, 12:33 AM)MattaFX Wrote: [ -> ]We're struggling on steps 2 and 3.

Ok, show us what you've got, and let us know what the errors are. Or, if there aren't errors, let us know what the current output is, as well as the expected output.

Pages: 1 2

Larz60+

Larz60+

Larz60+

nilamo

Larz60+

MattaFX

Kebap

Ofnuts

nilamo