Apr-26-2021, 05:37 AM
I need to create a stemming problem for an assignment but my problem is I can't figure out how to loop the conditions so all words are stemmed as much as possible. I'm new to coding, so my code is a mess but I thought about turning it into a while loop. The issue with that is the input of the for loop would have to be continuously changing to account for the stemmed/half stemmed outputs of the loop itself. I don't know how to do this.
My code will work for the main cases but not the ones they've hidden. I think it's because the code doesn't repeat itself so some words aren't completely stemmed.
These are the conditions of the stem and we can't use NLTK or any imports because the pylint checker will refuse it:
Remove all ownerships
Singular ownerships (e.g., it's -> it)
Plural ownerships (e.g., theirs' -> their)
Remove plurals
Any words ending with 's' should remove it (e.g., 'gaps' -> 'gap', 'runs' -> 'run')
Any words ending with a vowel after removing 's' are excepted (e.g., 'gas', 'this', 'has')
Any words that has no vowels are excepted (e.g., 'CMPS', 'BTS')
Any words ending with 'us' or 'ss' are excepted (e.g., 'census', 'chess')
Any words ending with 'sses' should be converted to 'ss' (e.g., 'presses' -> 'press')
Any words ending with 'ies' should be converted to 'i' unless the length of the stemmed word is less than or equal to 2 (e.g., 'cries' -> 'cri', 'ties' -> 'tie')
Remove past tense
Any words ending with 'ed' should remove it (e.g., 'burned' -> 'burn', 'owned' -> 'own')
Any words ending with 'ied' should be converted to 'i' unless the length of the stemmed word is less than or equal to 2 (e.g., 'cried' -> 'cri', 'tied' -> 'tie')
Remove adjectives
Any words ending with 'er' should remove it (e.g., 'fuller' -> 'full', 'hanger' -> 'hang')
Remove verbs
Any words ending with 'ing' should remove it (e.g., 'singing' -> 'sing', 'cutting' -> 'cutt')
If the stemmed word length is less than 3 after removing 'ing', it should retain it (e.g., 'bring')
Remove adverbs
Any words ending with 'ly' should remove it (e.g., 'greatly' -> 'great', 'fully' -> 'ful')
Note 1: vowels contain the letter 'y'.
Note 2: As you can see, not all stemming rules would actually result in proper root word (e.g., cutt, supposedly -> suppos etc.), but this will do for this assignment.
This is my code:
My code will work for the main cases but not the ones they've hidden. I think it's because the code doesn't repeat itself so some words aren't completely stemmed.
These are the conditions of the stem and we can't use NLTK or any imports because the pylint checker will refuse it:
Remove all ownerships
Singular ownerships (e.g., it's -> it)
Plural ownerships (e.g., theirs' -> their)
Remove plurals
Any words ending with 's' should remove it (e.g., 'gaps' -> 'gap', 'runs' -> 'run')
Any words ending with a vowel after removing 's' are excepted (e.g., 'gas', 'this', 'has')
Any words that has no vowels are excepted (e.g., 'CMPS', 'BTS')
Any words ending with 'us' or 'ss' are excepted (e.g., 'census', 'chess')
Any words ending with 'sses' should be converted to 'ss' (e.g., 'presses' -> 'press')
Any words ending with 'ies' should be converted to 'i' unless the length of the stemmed word is less than or equal to 2 (e.g., 'cries' -> 'cri', 'ties' -> 'tie')
Remove past tense
Any words ending with 'ed' should remove it (e.g., 'burned' -> 'burn', 'owned' -> 'own')
Any words ending with 'ied' should be converted to 'i' unless the length of the stemmed word is less than or equal to 2 (e.g., 'cried' -> 'cri', 'tied' -> 'tie')
Remove adjectives
Any words ending with 'er' should remove it (e.g., 'fuller' -> 'full', 'hanger' -> 'hang')
Remove verbs
Any words ending with 'ing' should remove it (e.g., 'singing' -> 'sing', 'cutting' -> 'cutt')
If the stemmed word length is less than 3 after removing 'ing', it should retain it (e.g., 'bring')
Remove adverbs
Any words ending with 'ly' should remove it (e.g., 'greatly' -> 'great', 'fully' -> 'ful')
Note 1: vowels contain the letter 'y'.
Note 2: As you can see, not all stemming rules would actually result in proper root word (e.g., cutt, supposedly -> suppos etc.), but this will do for this assignment.
This is my code:
def pos(sentence): """ returns a string that stemmed all words from the given sentence.""" list1 = [] for words in sentence.split(): word = words.replace("s\'", "") word2 = word.replace("\'s","") word3 = words.rstrip("s") for elem in word2: for vowel in ['a', 'e', 'i', 'o', 'u', 'y']: vowel = 1*vowel if elem not in ['a', 'e', 'i', 'o', 'u', 'y']: word4 = 1*word2 if word2[-2:] in ['ss', 'us']: word4 = 1*word2 elif word2[-4:] == 'sses': word4 = word2.replace("sses", "ss") elif word2[-3:] == "ies": if len(word2.replace("ies", "i")) > 2: word4 = word2.replace("ies", "i") else: word4 = word2.replace("ies", "ie") elif word3[-1] in ['a', 'e', 'i', 'o', 'u', 'y']: word4 = 1*word2 else: word4 = word2.rstrip("s") if word4[-3:] == 'ied': if len(word4.replace("ied", "i")) > 2: word5 = word4.replace("ied", "i") else: word5 = word4.replace("ied", "ie") elif word4[-2:] == "ed": word5 = word4.replace("ed", "") else: word5 = 1*word4 if word5[-2:] == "er": word6 = word5.replace("er", "") else: word6 = 1*word5 if word6[-3:] == "ing": if len(word6.replace("ing", "")) >= 3: word7 = word6.replace("ing", "") if word7[-2:] == "ly": word8 = word7.replace("ly", "") elif word7[-2:] == "er": word8 = word7.replace("er", "") else: word8 = 1*word7 elif word6[-2:] == "ly": word7 = word6.replace("ly", "") if word7[-3:] == "ing": if len(word6.replace("ing", "")) >= 3: word8 = word7.replace("ing", "") elif word7[-2:] == "er": word8 = word7.replace("er", "") else: word8 = 1*word7 else: word8 = 1*word6 list1.append(word8) final = " ".join(list1) return finalHere are examples to try:
stemmed = pos("The consensus chopped off last Friday while the voters were tied down flying without counting properly so wasn't a great day after all") print(stemmed)output:
Output:The consensus chopp off last Friday while the vot were tie down f without count prop so wasn't a great day aft al
lstemmed = pos("today is a lovely day while my CMPS was not the best and it's days are over but oh well what could be done about it right?") print(stemmed)ouput:
Output:today is a love day while my CMPS was not the best and it days are ov but oh well what could be done about it right?