Posts: 6
Threads: 1
Joined: Jan 2025
Greetings,
I am part the way through my Online Python course. I started learning Python for one very specific use-case and I thought maybe it would be smart to ask the forums about my use-case so I can focus my Python attentions more strategically.
My use case is this...
I have a scrambled code made up of 300 English letters. It takes the form of a Word Search puzzle, so it's format is 20 columns by 15 rows. However, it is more than just a simple wordsearch. Hidden in this 300 letters are presumed to be various anagrams that produce an English message.
Now, given the number of characters we're working with, it is very easy to find LOTS of words in each row of letters or column of letters.
However, I believe the words I am looking for will be specific . So, I have spent time building myself a 'Word-List'. Right now it has approximately 1,000 words on it.
Here is what I want to code... I want to be able to tell my code that [it] can arrange each row of letters (15 rows total) of 20 letters each(20 columns total) in any order that it wants - in an attempt to match a word on my Word-List. Check horizontal rows for the word, then check vertical columns for the word. IF a match is found, make a note. ELSE, no match is found, move on to the next word on the word list.
So it would ultimately check the horizontal lines h1-h15, then check the vertical lines v1-v20. Once all rows and all columns have been checked for that word, move on to the next word on the list.
I believe my list will continue to grow, but shouldn't exceed more than a few thousand words on my dictionary word-list. Again, these are specific words for specific topics that I believe are relevant to the puzzle I'm solving.
So since I am learning Python, how should I go about coding this, and what 'Python-coding-topics' should I focus my attentions on.
Appreciate you all for reading this far and look forward to your Yoda guidance!
Thank You.
Posts: 6
Threads: 1
Joined: Jan 2025
Jan-07-2025, 10:48 PM
(This post was last modified: Jan-07-2025, 10:50 PM by Anorak.)
![[Image: 1Qb1300]](https://ibb.co/1Qb1300)
Here is an example of my scrambled word search. As an example, one of the words on my word-list is QUARTER.
So while examining horizontal line 13, it would find a match for that word, make a note, and continue on.
Hopefully this task makes sense as I could really use some guidance about coding this.
Posts: 1,093
Threads: 143
Joined: Jul 2017
You are probably not the first person to ever have wanted to do this. There are, as I see it, 2 stages:
1. Making
2. Solving
If you grasp the making, the solving will come easier, a kind of reverse-engineering of the creation. That said, resolving the words you found into sensible English may be tricky. I don't think word-search puzzles usually contain a readable text, unless you have a sensible criterion for the word order.
A very quick search found this page about how to create such puzzles. There are probably many more examples out there.
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Jan-07-2025, 10:38 PM)Anorak Wrote: Hidden in this 300 letters are presumed to be various anagrams that produce an English message.
However, I believe the words I am looking for will be specific . So, I have spent time building myself a 'Word-List'. Right now it has approximately 1,000 words on it.
Here is what I want to code... I want to be able to tell my code that [it] can arrange each row of letters (15 rows total) of 20 letters each(20 columns total) in any order that it wants - in an attempt to match a word on my Word-List. Check horizontal rows for the word, then check vertical columns for the word. IF a match is found, make a note. ELSE, no match is found, move on to the next word on the word list.
Too much ambiguity. If you can't describe problem it's very seldom get solved. Conditions like "Presumed to be various anagrams", "I believe the words I am looking for will be specific" are very hard to put into any code.
Take your time and write what you want to achieve. If it's defined only then it's time to start to think how to do it. Based on information provided I can assume that one example could be:
- I want to find anagrams of words in my words dictionary
- I want find them from rows and columns of matrix
- Anagrams of words can be overlapping
- No check for anagrams in diagonals
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 2,125
Threads: 11
Joined: May 2017
Maybe this helps. I'm unsure if is_anagram is right. Use sorted to sort the words by length. Then use itertools.groupby to group words with equal length. Then make the product of it and with repeat=2 you get two items of the product. Of course this can take very long, if the wordlist is too long. I think the complexity is quadratic.
from pathlib import Path
from itertools import groupby, product
def is_anagram(word_a, word_b):
"""
Should return True, if the word_a is an anagram of word_b.
"""
# strip word_a and word_b and convert to lowercase
word_a = word_a.strip().lower()
word_b = word_b.strip().lower()
# compare word length and if all characters are the same set
# if this is not the case, it isn't an anagram
if len(word_a) != len(word_b) or set(word_a) != set(word_b):
return False
# distance must be 2, if it's an anagram
distance = 0
# checking char by char and if they are different, then add 1 to difference
for char_a, char_b in zip(word_a, word_b):
if char_a != char_b:
distance += 1
# early exit to save time
if distance > 2:
return False
return distance == 2
def get_words(file):
with open(file) as fd:
# stripping whitespaces and newline
# iterating line by line over the file
# sort the words by length
by_len = sorted(map(str.strip, fd), key=len)
# this part takes very long, if you have a big dictionary
for group, items in groupby(by_len, key=len):
for word_a, word_b in product(items, repeat=2):
if is_anagram(word_a, word_b):
yield (word_a, word_b)
# german collection of words
file = Path.home().joinpath("Desktop", "wörter.txt")
for anagram in get_words(file):
print(anagram)
Posts: 6
Threads: 1
Joined: Jan 2025
I'm going to dig into this and see if I can find useful pieces from the code your provided.
Right now my best idea is to convert each row or each column of characters into a length count. So horizontal line 1 (h1) might be read as h1= 2a, 1b, 1c, 4e, etc.. Counting each letter and removing the nulls. WOULD I SAVE THESE AS LISTS?
Then I need to utilize the same function on each next word on the Word-List, so the word QUARTER might be read as word_1 = 1a, 1e, 1q, 2r, 1t, 1u.
Then I need this to be compared to each line (FOR EACH?) h1-h15, and v1-v20. Looking for any lines that can accommodate the word_1 character count in full.
If my word_1 has 2 R's, and the v3 line for example, has 3 R's, I obviously want to consider that a match, So I'm assuming I need to tell it 'equal to or greater than' when comparing word_1 letter counts to each subsequent line.
Sorry to the replies that said I'm being too vague. Hopefully this additional explanation is more clear what my initial game plan is looking like.
Posts: 6,789
Threads: 20
Joined: Feb 2020
As a dictionary {'a': 1, 'e': 1, 'q': 1, 'r': 2, 't': 1, 'u': 1}
You could have additional information:
{'a': 1, 'e': 1, 'q': 1, 'r': 2, 't': 1, 'u': 1, 'len': 7, 'word': 'quarter'}
Posts: 1,950
Threads: 8
Joined: Jun 2018
Jan-09-2025, 06:09 PM
(This post was last modified: Jan-09-2025, 06:09 PM by perfringo.)
(Jan-08-2025, 05:02 PM)Anorak Wrote: Looking for any lines that can accommodate the word_1 character count in full.
/.../
Sorry to the replies that said I'm being too vague. Hopefully this additional explanation is more clear what my initial game plan is looking like.
Does it mean that "anagrams" are not from consecutive characters and you look for them in the characters of the whole row/column?
I decided to find "classic anagrams" (i.e. sequence of continuous characters) with brute force. For that:
- I grabbed initial data
- downloaded list of english words from Github (containing 370 104 words)
- made indices starting from length of two (i.e ignoring one letter words) for row length of 20 characters for making slices
- made slices using indices and made sorted strings out of them (for comparison later)
- read data from file to dictionary where key is string with sorted characters and values are words which can be made from key
- iterated over strings made from slices and and added values from dictionary if matched with key
Actual code has probably less characters than the description of it. Didn't do any tests so no guarantees that result is correct. Just to demonstrate one possible brute force approach (only for rows):
data = ("qvucalilenineteenbma", "cretehtreaoucsifevle", "oroundhnimihstoliaro",
"uhtronroeocehsrardwo", "noubtdeloeltlntretaw", "tetatseonyneoaolineo",
"pmuehttsewodahreeret", "eonskaeptoooilpasiec", "lesdbnooraooevlewtsr",
"entufivetntncoiujtso", "vrniemaklawescodtrid", "eadtteyossquareeerht",
"nmtopleceohetquarryo", "nsiosmixzfdqbccasliy", "apsorezngqzpsapphire")
indices = [(start, end) for start in range(20) for end in range(start+2, 21)]
slices = [''.join(sorted(row[begin:stop])) for row in data for (begin, stop) in indices]
mapping = dict()
with open("words_alpha.txt", "r") as f:
for row in f:
word = row.strip()
value = ''.join(sorted(word))
mapping.setdefault(value, set()).add(word)
texts = [mapping[item] for item in slices if item in mapping]
anagrams = set(word for text in texts for word in text)
# found 1407 words which can be made from consecutive characters in rows
# took around 1 sec on my machine
# longest word is 14 characters - heterochronous - meaning occurring at different times or stages
# words longer than 7 characters below, BTW meaning of first word is "mineral consisting of a vermiculit" :-);
lennilite
chaetiferous
haliotis
outreach
unbolted
nineteen
doubleton
complete
outraces
reheater
cordites
etherate
junction
papisher
sapphire
alkamine
horsecar
cosharer
feracious
heterochronous
lotharios
rheotron
decorist
isolator
outreaches
drawhorse
historial
doblones
rechoose
racehorse
bradoons
homilist
sectroid
othinism
ostiolar
trawlnet
chresard
ostentate
cotunnite
doubletone
leninite
outbleed
leucaniline
shoreward
readhere
isoapiole
contemple
shadower
outsearch
heterodera
towheads
andorobo
emplecton
bandolero
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 6
Threads: 1
Joined: Jan 2025
(Jan-09-2025, 06:09 PM)perfringo Wrote: (Jan-08-2025, 05:02 PM)Anorak Wrote: Looking for any lines that can accommodate the word_1 character count in full.
/.../
Sorry to the replies that said I'm being too vague. Hopefully this additional explanation is more clear what my initial game plan is looking like.
Does it mean that "anagrams" are not from consecutive characters and you look for them in the characters of the whole row/column?
I decided to find "classic anagrams" (i.e. sequence of continuous characters) with brute force. For that:
- I grabbed initial data
- downloaded list of english words from Github (containing 370 104 words)
- made indices starting from length of two (i.e ignoring one letter words) for row length of 20 characters for making slices
- made slices using indices and made sorted strings out of them (for comparison later)
- read data from file to dictionary where key is string with sorted characters and values are words which can be made from key
- iterated over strings made from slices and and added values from dictionary if matched with key
Actual code has probably less characters than the description of it. Didn't do any tests so no guarantees that result is correct. Just to demonstrate one possible brute force approach (only for rows):
data = ("qvucalilenineteenbma", "cretehtreaoucsifevle", "oroundhnimihstoliaro",
"uhtronroeocehsrardwo", "noubtdeloeltlntretaw", "tetatseonyneoaolineo",
"pmuehttsewodahreeret", "eonskaeptoooilpasiec", "lesdbnooraooevlewtsr",
"entufivetntncoiujtso", "vrniemaklawescodtrid", "eadtteyossquareeerht",
"nmtopleceohetquarryo", "nsiosmixzfdqbccasliy", "apsorezngqzpsapphire")
indices = [(start, end) for start in range(20) for end in range(start+2, 21)]
slices = [''.join(sorted(row[begin:stop])) for row in data for (begin, stop) in indices]
mapping = dict()
with open("words_alpha.txt", "r") as f:
for row in f:
word = row.strip()
value = ''.join(sorted(word))
mapping.setdefault(value, set()).add(word)
texts = [mapping[item] for item in slices if item in mapping]
anagrams = set(word for text in texts for word in text)
# found 1407 words which can be made from consecutive characters in rows
# took around 1 sec on my machine
# longest word is 14 characters - heterochronous - meaning occurring at different times or stages
# words longer than 7 characters below, BTW meaning of first word is "mineral consisting of a vermiculit" :-);
lennilite
chaetiferous
haliotis
outreach
unbolted
nineteen
doubleton
complete
outraces
reheater
cordites
etherate
junction
papisher
sapphire
alkamine
horsecar
cosharer
feracious
heterochronous
lotharios
rheotron
decorist
isolator
outreaches
drawhorse
historial
doblones
rechoose
racehorse
bradoons
homilist
sectroid
othinism
ostiolar
trawlnet
chresard
ostentate
cotunnite
doubletone
leninite
outbleed
leucaniline
shoreward
readhere
isoapiole
contemple
shadower
outsearch
heterodera
towheads
andorobo
emplecton
bandolero
Where are the matches being output? I ran this code on pycharm, but I can't find a list of matches like those you've provided. I'm sure I'm missing something... When I run it Pycham simply returns "Process finished with exit code 0" which I think just means it completed its task.
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Jan-09-2025, 09:26 PM)Anorak Wrote: Where are the matches being output?
There is no print in this code so nothing is outputted. However, anagrams found by this code are stored in helpfully named anagrams . You can print them out or do whatever you want with them.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
|