Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract relation
#1
can someone help please,( I'm new to programming ) I am trying to chunk sentences to sub sentences. I have a list of sentences such as:
my_list = [‘ word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10’, ‘word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9’]

I want to chunk them to substring before entity, after and between. This is my attempt to split the list.

here

[ lis = [(word1,(startindex,endindex)),(Entity5,(startindex,endindex))…….]
entity_list = [[ Entity5,Entity8] ,[Entity2,Entity5,Entity8],[…,….,…,…,,…]]
start_index = [ ..,..,..,]
end_index = [..,..,..,]

For l in my_list:
For I in range (len(start_index)):
For j in range (len(end_index)):
Before = l [0:start_index[i]]
Between = l [endindex [j]:start_index[i+1]
After = l[end_index[j+1]:-1]

But that does not work.
Reply
#2
use code tags and fix indentation

your list
my_list = [‘ word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10’, ‘word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9’]
only has two items, to split these into two lists do the following:
first, you also have invalid identifiers (funky quotes, use ')
like:
my_list = ['word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9']
Then split with:
my_list = ['word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9']
print(my_list[0])
list1 = my_list[0].split()
list2 = my_list[1].split()
print('list1: {}\nlist2: {}'.format(list1, list2))
Output:
word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10 list1: ['word1', 'word2', 'word3', 'word4', 'Entity5', 'word6', 'word7', 'Entity8', 'word9', 'word10'] list2: ['word1', 'Entity2', 'word3', 'word4', 'Entity5', 'word6', 'word7', 'Entity8', 'word9']
jou can now use join to concatenate what you want back into 'chunks'
Reply
#3
What a stupid task  Wall

However, the fist task is the easiest. Just putting all words together in one flat list is a one liner.
After this you can iterate over the list and seek for entity. If you don't find it, append the word to a list.
If you find one, append the name of the [entity_before, words] to a list.
Then make a new list. Finally your function returns the final list.
Trying to append in the right way the entity_after, let my head explode.

So you can make a second function which iterates over the first result list with entity before.
Take the element of entity -1, replace the name and put them together with the current words.

Here my example:
my_list = [
    'word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 
    'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9',
    ]

# putting all words in one flat list
words = [word for chunk in my_list for word in chunk.split()]


# is like
#words = []
#for element in my_list:
#    for word in element.split():
#        words.append(word)


def get_words_before(wordlist):
   entity = []
   words_before = []
   for word in wordlist:
       if 'entity' in word.lower():
           entity.append([word + '_before', words_before])
           words_before = []
       else:
           words_before.append(word)
   return entity


def get_words(wordlist):
   entities = get_words_before(wordlist)
   result = []
   for index, entity in enumerate(entities):
       if index == 0:
           result.append(entity)
           continue
       entity_after = entities[index-1][0].replace('_before', '_after')
       result.append([entity_after, entity[1]])
       result.append(entity)
   return result


import pprint
pprint.pprint(get_words(words))
Another solution can be with a generator, but without entity_after:

def get_words_before(wordlist):
   words = []
   for word in wordlist:
       if 'entity' in word.lower():
           yield word + '_before', words
           words = []
       else:
           words.append(word)


import pprint
result = list(get_words_before(words))
pprint.pprint(result)
I hope it helps a little bit. I think there are better solutions for it.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#4
Please don't PM moderators.
The forum is for all to benefit from
Reply
#5
I got also an PN. I wrote him, that he should write it public, because there are others who may have the same question.

Writing a PN to members, to get a problem solved, is a kind of ego trip. I don't like this behavior and it's also not the sense of a forum.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#6
I apologise for inconvenience. Unfortunately, I'm not familiar with a forum. Actually, I did not see that is a button to reply in public until you mentioned that. I just want to say thank you, I solved the problem.
Reply
#7
No problem, just want to keep it public
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Chain object that have parent child relation.. SpongeB0B 10 1,079 Dec-12-2023, 01:01 PM
Last Post: Gribouillis
  updating certain values in dict. with relation to their keys malevy 17 5,306 Nov-27-2019, 02:37 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020