Dec-13-2016, 03:33 PM
Quote:Could you please show - as example - how you would build and fill a dictionary?when working.
Quote:Thanks a lot for your proposition!
??? didn't know I had one
Quote:Could you please show - as example - how you would build and fill a dictionary?when working.
Quote:Thanks a lot for your proposition!
Error:Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python35\lib\site-packages\nltk\data.py", line 810, in load
resource_val = json.load(opened_resource)
File "C:\Python35\lib\json\__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\Python35\lib\json\__init__.py", line 312, in loads
s.__class__.__name__))
The command I used was:>>> import codecs >>> reader = codecs.getreader("utf-8") >>> nltk.data.load("https://code.google.com/archive/p/relation-extraction-corpus/downloads/20131104-place_of_death.json")I also tried it without the codecs, and again with escaped slashes.
(Dec-14-2016, 01:21 AM)Larz60+ Wrote: [ -> ]I wish that you had posted the entire task at the start.
Since you sent it in a private e-mail, I ignored it until a few minutes ago.
Quote:I would have just responded "post that in the forum".
(Dec-19-2016, 12:33 AM)MattaFX Wrote: [ -> ]So, I have here our to do list.
We're struggling on steps 2 and 3.
Quote:We will perform the classification for two kinds of relations: place of birth and institution. Your task is to design and extract a set of features, to give these features as input to the logistic regression and to evaluate the performance.
Data and tools:
Both for training and testing, we will use the Google relation extraction corpus (https://code.google.com/archive/p/relati.../downloads). This corpus consists of snippets of text from Wikipedia with annotated relations. You can find more information on this blog (https://research.googleblog.com/2013/04/...ation.html).
The entities in the annotated relations are encoded with IDs. This means that you need to search the Knowledge Graph in order to see what entities are in relation. For querying the database, follow Google Knowlegde graph API Search (https://developers.google.com/knowledge-graph/).
Steps
1.1 Find positive and negative examples for the two relations
- Preprocessing:
Each relation in the corpus is annotated by several (up to seven) annotators, who gave different responses. As a result, we are not certain whether a text snippet is a positive or a negative example of a relation. Your first task is to find a way to divide the snippets into positive and negative examples based on the distribution of annotators' responses. Note that the proportion of positive vs. negative examples might influence your results.
1.2 Resolve IDs. Once you resolve the IDs, identify the strings in the text snippet. Note that there could be errors here, which can then propagate to the subsequent steps. If you cannot find the entities in some snippets, you may remove them.
1.3 Prepare a development data set to use for the analysis and developing the set of features. These items must not be used for testing.
1.4 Consider some ideas for the features. Decide whether you need any additional tools (e.g. PoS tagger, parsers) to extract the features. If yes, install them and make sure that you can run them on your data.
2 Feature extraction
2.1 Based on the literature and your own intuition, design a set of features to be used for classification. Describe your features and the intuition about what they are supposed to capture.
2.2 Write a script that extracts the features and prepares the data set for a classifier. Include in this step any formatting required by the classifier.
3 Running a classifier and evaluation
Perform a binary classification using your prepared data set and the Logistic Regression classifier from the Python scikit-learn library. Make sure you understand the output well.
Perform the evaluation as a 10-fold cross-validation. Report the result of each fold. Your final result is the average of the 10 folds.
(Dec-19-2016, 12:33 AM)MattaFX Wrote: [ -> ]We're struggling on steps 2 and 3.