Python Forum

Full Version: extract relation
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
can someone help please,( I'm new to programming ) I am trying to chunk sentences to sub sentences. I have a list of sentences such as:
my_list = [‘ word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10’, ‘word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9’]

I want to chunk them to substring before entity, after and between. This is my attempt to split the list.

here

[ lis = [(word1,(startindex,endindex)),(Entity5,(startindex,endindex))…….]
entity_list = [[ Entity5,Entity8] ,[Entity2,Entity5,Entity8],[…,….,…,…,,…]]
start_index = [ ..,..,..,]
end_index = [..,..,..,]

For l in my_list:
For I in range (len(start_index)):
For j in range (len(end_index)):
Before = l [0:start_index[i]]
Between = l [endindex [j]:start_index[i+1]
After = l[end_index[j+1]:-1]

But that does not work.
use code tags and fix indentation

your list
my_list = [‘ word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10’, ‘word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9’]
only has two items, to split these into two lists do the following:
first, you also have invalid identifiers (funky quotes, use ')
like:
my_list = ['word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9']
Then split with:
my_list = ['word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9']
print(my_list[0])
list1 = my_list[0].split()
list2 = my_list[1].split()
print('list1: {}\nlist2: {}'.format(list1, list2))
Output:
word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10 list1: ['word1', 'word2', 'word3', 'word4', 'Entity5', 'word6', 'word7', 'Entity8', 'word9', 'word10'] list2: ['word1', 'Entity2', 'word3', 'word4', 'Entity5', 'word6', 'word7', 'Entity8', 'word9']
jou can now use join to concatenate what you want back into 'chunks'
What a stupid task  Wall

However, the fist task is the easiest. Just putting all words together in one flat list is a one liner.
After this you can iterate over the list and seek for entity. If you don't find it, append the word to a list.
If you find one, append the name of the [entity_before, words] to a list.
Then make a new list. Finally your function returns the final list.
Trying to append in the right way the entity_after, let my head explode.

So you can make a second function which iterates over the first result list with entity before.
Take the element of entity -1, replace the name and put them together with the current words.

Here my example:
my_list = [
    'word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 
    'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9',
    ]

# putting all words in one flat list
words = [word for chunk in my_list for word in chunk.split()]


# is like
#words = []
#for element in my_list:
#    for word in element.split():
#        words.append(word)


def get_words_before(wordlist):
   entity = []
   words_before = []
   for word in wordlist:
       if 'entity' in word.lower():
           entity.append([word + '_before', words_before])
           words_before = []
       else:
           words_before.append(word)
   return entity


def get_words(wordlist):
   entities = get_words_before(wordlist)
   result = []
   for index, entity in enumerate(entities):
       if index == 0:
           result.append(entity)
           continue
       entity_after = entities[index-1][0].replace('_before', '_after')
       result.append([entity_after, entity[1]])
       result.append(entity)
   return result


import pprint
pprint.pprint(get_words(words))
Another solution can be with a generator, but without entity_after:

def get_words_before(wordlist):
   words = []
   for word in wordlist:
       if 'entity' in word.lower():
           yield word + '_before', words
           words = []
       else:
           words.append(word)


import pprint
result = list(get_words_before(words))
pprint.pprint(result)
I hope it helps a little bit. I think there are better solutions for it.
Please don't PM moderators.
The forum is for all to benefit from
I got also an PN. I wrote him, that he should write it public, because there are others who may have the same question.

Writing a PN to members, to get a problem solved, is a kind of ego trip. I don't like this behavior and it's also not the sense of a forum.
I apologise for inconvenience. Unfortunately, I'm not familiar with a forum. Actually, I did not see that is a button to reply in public until you mentioned that. I just want to say thank you, I solved the problem.
No problem, just want to keep it public