Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Get data out of connl-file
#1
Currently, I'm working with connl-files which look like the one in the attachment (saved as .txt-file, since bz2-files are not allowed to be uploaded).

I'd like to extract all the heads of genitives, and the genitives themselves. My actual code looks like  this:

import csv, bz2
from collections import Counter, deque, defaultdict

names = ('1', '2', '3', '4', '5', '6', '7', '8', '9', '10')

filename = "test.conll.bz2"

nouns = Counter()
d = defaultdict(list)

with open(filename) as f:
   f = bz2.BZ2File(filename, "rb")
   reader = csv.DictReader(f, fieldnames=names, delimiter='\t')

   act_index = 0
   last = deque(maxlen=50)
   for tok in reader:
       last.append(tok)
       if (tok['5'] == 'NN' or tok['5'] == 'NE') and 'Gen' in tok['6']:
           dep = tok['7']
           act_index = list(last).index(tok) 
           act = tok['3']
           while last[act_index-1]['7'] != dep:
               act_index = act_index-1
           while last[act_index]['5'] != 'NN':
               act_index = act_index+1
           else:
               nouns.update(last[act_index]['3'].split())
               d[last[act_index]['3']].append(act)



if __name__ == '__main__':

   for el in sorted(nouns, key=nouns.get, reverse=True):
       Gen = ""
       for e in d[el]:
           if e not in Gen:
               Gen += e + ","
           
       Gen = Gen[:-1] 
       print el + "\t" + Gen
My code has some feeblenesses...example: sometimes, I get an error "deque index out of range", but I don't know why. Then, I observed that if in the connl-file are two times a similar sentence one after the other, my code does not work properly (it only adds the first occurrence). And finally, I think that my code could be written more efficiently.

My main problem is really to get all the heads with all the genitive forms. A desired output would be:
Quote:Geburt \t Kind
Henker \t Geschichte
Kind \t Herr,Adam

Thanks a lot for any advice or hint.

Attached Files

.txt   test.txt (Size: 12.43 KB / Downloads: 578)
Reply


Messages In This Thread
Get data out of connl-file - by MattaFX - Jun-19-2017, 09:35 AM
RE: Get data out of connl-file - by ichabod801 - Jun-19-2017, 09:50 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  xml file creation from an XML file template and data from an excel file naji_python 1 2,138 Dec-21-2020, 03:24 PM
Last Post: Gribouillis
  How to save CSV file data into the Azure Data Lake Storage Gen2 table? Mangesh121 0 2,136 Jun-26-2020, 11:59 AM
Last Post: Mangesh121

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020