Python Forum

Full Version: Word matching with specific parameters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Word matching with specific parameters

Hello all,


I am quite new to using Python so please bear with me.

I am looking to create a script that will match combinations of words given a specific criteria.

Is this possible with Python?


I have three unknown words in a puzzle that I'm trying to solve. There are a few known similarities between the words as well as the word lengths.

I have a 5 letter word, followed by an 8 letter word, and ending with another 5 letter word.

- Letter 1 on word 1 is identical as letter 2 on word 3.

- Letter 6 on word 2 is identical as letter 1 on word 3

- Letters 7 and 8 on word two are the same.

- No other characters are identical


Similarities shown below using random symbols:

$????

?????#%%

#$???


I have 2 txt files, one with a list of 5 letter words and the other with a list of 8 letter words. (I can make another file for 5 letter words or combine both into one file if necessary)
I can edit the 8 letter word file to only contain double letter ending words if that helps!

Is this something I can utilize Python to help solve by printing a list of possible solutions? And would you kind folks help me write the script?
You should provide a sample text. I can't easily think of three word combinations which fit your conditions.

To solve this exercise you can first find all combinations of 5-8-5 letter words in your text. Probably there are not many. Then you can look at each group you found to see if any fit your conditions.

path2text = '/home/pedro/temp/franky_part.txt'
# get the text
with open(path2text) as t:
    text = t.read()
    words_list = text.split()

len(words_list) # returns 201 in this case

possibilities = []
# now loop through the words list looking for combinations of 5 letters followed by 8 letters followed by 5 letters
for i in range(len(words_list) - 2):
    if len(words_list[i]) == 5 and len(words_list[i+1]) == 8 and len(words_list[i+2]) == 5:
        print(f'Found a possible combination of words: {words_list[i], words_list[i+1], words_list[i+2]}')
        tup = (words_list[i], words_list[i+1], words_list[i+2])
        possibilities.append(tup)

# look at the tuples in possibilities to see if they fit the criteria
for tup in possibilities:
    word1, word2, word3 = tup
    if word1[0] == word3[1]:
        print('Condition 1 is true.')
    else:
        print('Condition 1 is not met. Aborting.')
        continue
    if word2[5] == word3[0]:
        print('Condition 2 is true.')
    else:
        print('Condition 2 is not met. Aborting.')
        continue
    if word2[6] == word2[7]:
        print('Condtion 3 is true! we have found a match for 3 conditions! Yay!')
        print(f'Matching words are: {tup}')
    else:
        print('Condition 3 is not met. Aborting.')
       continue
If I was doing this for real I would use the Python module re, or the module regex. Find out about that here.

The output from searching my words list:

Output:
Found a possible combination of words: ('Peter', 'possible', 'party') Found a possible combination of words: ('Peter', 'possible', 'party')
The output from searching the list possibilities:

Output:
Condition 1 is not met. Aborting.
Here is my solution (untested code, please post lists of words for validation)
from collections import defaultdict
from itertools import product
from pathlib import Path


def is_injective(word):
    """Indicates that all the letters are different in a word"""
    return len(word) == len(set(word))


def has_end_doubled(word):
    """Indicates that the two last letters of a word are identical"""
    return word[-2] == word[-1]


def generate_candidates(path5, path8):
    word5 = Path(path5).read_text().strip().split()
    word5 = [w for w in word5 if is_injective(word)]

    word8 = Path(path8).read_text().strip().split()
    word8 = [w for w in word8 if has_end_doubled(word) and is_injective(word[:-1])]

    dic = [defaultdict(set) for i in range(3)]

    for w in word5:
        dic[0][w[0]].add(w)
        dic[2][w[:2]].add(w)
    for w in word8:
        dic[1][w[5]].add(w)

    dic = [dict(d) for d in dic]

    for (a, b), words in dic[2].items():
        for u, v, w in product(dic[0].get(b, ()), dic[1].get(a, ()), words):
            # ensure that no other letters are identical
            x = u + v + w
            if len(x) - 3 == len(set(x)):
                yield u, v, w


def main(path5, path8):
    for u, v, w in generate_candidates(path5, path8):
        print(u, v, w)


if __name__ == "__main__":
    main("word5.txt", "word8.txt")
(Jan-28-2025, 01:36 AM)CascadeDiver Wrote: [ -> ]- No other characters are identical

How this is applied? To word #2? To all words?