Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Stemming Problem
#1
I need to create a stemming problem for an assignment but my problem is I can't figure out how to loop the conditions so all words are stemmed as much as possible. I'm new to coding, so my code is a mess but I thought about turning it into a while loop. The issue with that is the input of the for loop would have to be continuously changing to account for the stemmed/half stemmed outputs of the loop itself. I don't know how to do this.

My code will work for the main cases but not the ones they've hidden. I think it's because the code doesn't repeat itself so some words aren't completely stemmed.

These are the conditions of the stem and we can't use NLTK or any imports because the pylint checker will refuse it:

Remove all ownerships
Singular ownerships (e.g., it's -> it)
Plural ownerships (e.g., theirs' -> their)
Remove plurals
Any words ending with 's' should remove it (e.g., 'gaps' -> 'gap', 'runs' -> 'run')
Any words ending with a vowel after removing 's' are excepted (e.g., 'gas', 'this', 'has')
Any words that has no vowels are excepted (e.g., 'CMPS', 'BTS')
Any words ending with 'us' or 'ss' are excepted (e.g., 'census', 'chess')
Any words ending with 'sses' should be converted to 'ss' (e.g., 'presses' -> 'press')
Any words ending with 'ies' should be converted to 'i' unless the length of the stemmed word is less than or equal to 2 (e.g., 'cries' -> 'cri', 'ties' -> 'tie')
Remove past tense
Any words ending with 'ed' should remove it (e.g., 'burned' -> 'burn', 'owned' -> 'own')
Any words ending with 'ied' should be converted to 'i' unless the length of the stemmed word is less than or equal to 2 (e.g., 'cried' -> 'cri', 'tied' -> 'tie')
Remove adjectives
Any words ending with 'er' should remove it (e.g., 'fuller' -> 'full', 'hanger' -> 'hang')
Remove verbs
Any words ending with 'ing' should remove it (e.g., 'singing' -> 'sing', 'cutting' -> 'cutt')
If the stemmed word length is less than 3 after removing 'ing', it should retain it (e.g., 'bring')
Remove adverbs
Any words ending with 'ly' should remove it (e.g., 'greatly' -> 'great', 'fully' -> 'ful')

Note 1: vowels contain the letter 'y'.

Note 2: As you can see, not all stemming rules would actually result in proper root word (e.g., cutt, supposedly -> suppos etc.), but this will do for this assignment.



This is my code:

def pos(sentence):
    """  returns a string that stemmed all words from the given sentence."""
    list1 = []

    for words in sentence.split():
        word = words.replace("s\'", "")
        word2 = word.replace("\'s","")
        word3 = words.rstrip("s")
        
        for elem in word2:
            for vowel in ['a', 'e', 'i', 'o', 'u', 'y']:
                vowel = 1*vowel
                if elem not in ['a', 'e', 'i', 'o', 'u', 'y']:
                    word4 =  1*word2
        if word2[-2:] in ['ss', 'us']:
            word4 = 1*word2
        elif word2[-4:] == 'sses':
            word4 = word2.replace("sses", "ss")
        elif word2[-3:] == "ies":
            if len(word2.replace("ies", "i")) > 2:
                word4 = word2.replace("ies", "i") 
            else: 
                word4 = word2.replace("ies", "ie")              
        elif word3[-1] in ['a', 'e', 'i', 'o', 'u', 'y']:
            word4 = 1*word2
        else: 
            word4 = word2.rstrip("s") 

        if word4[-3:] == 'ied':
            if len(word4.replace("ied", "i")) > 2:
                word5 = word4.replace("ied", "i")
            else: 
                word5 = word4.replace("ied", "ie")
        elif word4[-2:] == "ed":
            word5 = word4.replace("ed", "")
        else:
            word5 = 1*word4

        if word5[-2:] == "er":
            word6 = word5.replace("er", "")
        else:
            word6 = 1*word5 

        if word6[-3:] == "ing":
            if len(word6.replace("ing", "")) >= 3:
                word7 = word6.replace("ing", "")
                if word7[-2:] == "ly":
                    word8 = word7.replace("ly", "")
                elif word7[-2:] == "er":
                    word8 = word7.replace("er", "")
                else:
                    word8 = 1*word7
        elif word6[-2:] == "ly":
            word7 = word6.replace("ly", "")
            if word7[-3:] == "ing":
                if len(word6.replace("ing", "")) >= 3:
                    word8 = word7.replace("ing", "")
            elif word7[-2:] == "er":
                word8 = word7.replace("er", "")
            else: 
                word8 = 1*word7            
        else:
            word8 = 1*word6

        list1.append(word8)

    final = " ".join(list1)
    return final
Here are examples to try:

stemmed = pos("The consensus chopped off last Friday while the voters were tied down flying without counting properly so wasn't a great day after all")
print(stemmed)
output:
Output:
The consensus chopp off last Friday while the vot were tie down f without count prop so wasn't a great day aft al
l
stemmed = pos("today is a lovely day while my CMPS was not the best and it's days are over but oh well what could be done about it right?")
print(stemmed)
ouput:
Output:
today is a love day while my CMPS was not the best and it days are ov but oh well what could be done about it right?
Reply
#2
I would probably instead write a function that takes a single word and then attempts to stem it as much as possible, returning the stem.

Then you could concentrate in that function that you just keep running the rules until either:
  • You get a "excepted" condition
  • The output didn't change
nilamo likes this post
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020