Posts: 4
Threads: 2
Joined: Jul 2017
can someone help please,( I'm new to programming ) I am trying to chunk sentences to sub sentences. I have a list of sentences such as:
my_list = [‘ word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10’, ‘word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9’]
I want to chunk them to substring before entity, after and between. This is my attempt to split the list.
here
[ lis = [(word1,(startindex,endindex)),(Entity5,(startindex,endindex))…….]
entity_list = [[ Entity5,Entity8] ,[Entity2,Entity5,Entity8],[…,….,…,…,,…]]
start_index = [ ..,..,..,]
end_index = [..,..,..,]
For l in my_list:
For I in range (len(start_index)):
For j in range (len(end_index)):
Before = l [0:start_index[i]]
Between = l [endindex [j]:start_index[i+1]
After = l[end_index[j+1]:-1]
But that does not work.
Posts: 12,031
Threads: 485
Joined: Sep 2016
Jul-09-2017, 12:51 AM
(This post was last modified: Jul-09-2017, 12:51 AM by Larz60+.)
use code tags and fix indentation
your list
my_list = [‘ word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10’, ‘word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9’] only has two items, to split these into two lists do the following:
first, you also have invalid identifiers (funky quotes, use ')
like:
my_list = ['word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9'] Then split with:
my_list = ['word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10', 'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9']
print(my_list[0])
list1 = my_list[0].split()
list2 = my_list[1].split()
print('list1: {}\nlist2: {}'.format(list1, list2)) Output: word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10
list1: ['word1', 'word2', 'word3', 'word4', 'Entity5', 'word6', 'word7', 'Entity8', 'word9', 'word10']
list2: ['word1', 'Entity2', 'word3', 'word4', 'Entity5', 'word6', 'word7', 'Entity8', 'word9']
jou can now use join to concatenate what you want back into 'chunks'
Posts: 2,126
Threads: 11
Joined: May 2017
Jul-10-2017, 09:46 AM
(This post was last modified: Jul-10-2017, 09:46 AM by DeaD_EyE.)
What a stupid task
However, the fist task is the easiest. Just putting all words together in one flat list is a one liner.
After this you can iterate over the list and seek for entity. If you don't find it, append the word to a list.
If you find one, append the name of the [entity_before, words] to a list.
Then make a new list. Finally your function returns the final list.
Trying to append in the right way the entity_after, let my head explode.
So you can make a second function which iterates over the first result list with entity before.
Take the element of entity -1, replace the name and put them together with the current words.
Here my example:
my_list = [
'word1 word2 word3 word4 Entity5 word6 word7 Entity8 word9 word10',
'word1 Entity2 word3 word4 Entity5 word6 word7 Entity8 word9',
]
# putting all words in one flat list
words = [word for chunk in my_list for word in chunk.split()]
# is like
#words = []
#for element in my_list:
# for word in element.split():
# words.append(word)
def get_words_before(wordlist):
entity = []
words_before = []
for word in wordlist:
if 'entity' in word.lower():
entity.append([word + '_before', words_before])
words_before = []
else:
words_before.append(word)
return entity
def get_words(wordlist):
entities = get_words_before(wordlist)
result = []
for index, entity in enumerate(entities):
if index == 0:
result.append(entity)
continue
entity_after = entities[index-1][0].replace('_before', '_after')
result.append([entity_after, entity[1]])
result.append(entity)
return result
import pprint
pprint.pprint(get_words(words)) Another solution can be with a generator, but without entity_after:
def get_words_before(wordlist):
words = []
for word in wordlist:
if 'entity' in word.lower():
yield word + '_before', words
words = []
else:
words.append(word)
import pprint
result = list(get_words_before(words))
pprint.pprint(result) I hope it helps a little bit. I think there are better solutions for it.
Posts: 12,031
Threads: 485
Joined: Sep 2016
Please don't PM moderators.
The forum is for all to benefit from
Posts: 2,126
Threads: 11
Joined: May 2017
Jul-10-2017, 11:30 AM
(This post was last modified: Jul-10-2017, 11:30 AM by DeaD_EyE.)
I got also an PN. I wrote him, that he should write it public, because there are others who may have the same question.
Writing a PN to members, to get a problem solved, is a kind of ego trip. I don't like this behavior and it's also not the sense of a forum.
Posts: 4
Threads: 2
Joined: Jul 2017
I apologise for inconvenience. Unfortunately, I'm not familiar with a forum. Actually, I did not see that is a button to reply in public until you mentioned that. I just want to say thank you, I solved the problem.
Posts: 12,031
Threads: 485
Joined: Sep 2016
Jul-10-2017, 04:30 PM
(This post was last modified: Jul-10-2017, 04:30 PM by Larz60+.)
No problem, just want to keep it public
|