A deciphering problem

ibreeden · Aug-17-2021, 10:22 AM

Hi @Yoriz,
I am impressed by your solution and by how quick you implemented it. And by the beauty of simplicity.
But I must admit I also have troubles in understanding the code. I now see I have to study more on the subject of decorators in general and dataclasses in particular.
One drawback of your solution is that you guessed I would use the same list of shift-keys, which was right.

Thank you for your encoded answer, which deals all about good (Python)code. I am a bit reluctant to show my solution because I am afraid it is not as nice as yours. Perhaps it is because I am Dutch (as stated in your encoded message "Although that way may not be obvious at first unless you're Dutch") Smile

@Truman, we should not forget you are the one who asked the question. Are you making progress? I plan to publish and explain my solution tomorrow but I have to beautify it before I dare to show it. And perhaps someone else has a better solution.

jamesaarr · Aug-18-2021, 08:12 AM

(Aug-17-2021, 06:19 AM)DPaul Wrote: The keys are indeed (32, 22, ...) or (34, 44,...) depending on adding or subtracting.

The reason I asked if the solution was part of the initial data, is that i see limited use
in decoding with translation or spelling tables etc...

With a nice little poem it's feasible, but what if your message is essentially GPS codes,
longitudes, latitudes, distances,.... You'll end up in the wrong place.

I was looking for a generic solution, i'm very curious what that might be.

In the movie, Enigma was broken because messages always ended with the same 2 words.
Modern codemakers may have learned from that.

Paul

The movie based on real life Alan Turing :') Big Up Britain! Haha! Would highly recommending looking into how he broke the enigma cypher, I've seen people make their own and it's quite interesting!

ibreeden · (This post was last modified: Aug-18-2021, 02:11 PM by ibreeden.)

Hi @Truman,
Did you make progress? I know the Homework section should not contain complete programs, but examples already have been posted, so I feel free to do the same. I promised to explain my ideas so here it comes. As emphasized by DPaul my algorithm is a bit fuzzy. But I like fuzzy logic. Smile

I left the program as you posted it almost un-altered. You did a good job. But instead of printing each permutation of a line to the output, each permutation is stored in an object (eat_line()). In this object the line is split to words and each word gets a score for being recognized as an English word. The average score of the line is then used as a key to store the line in a dictionary.
When your program has produced all variations of a line, the line(s) with the best score is (are) printed (best_result()).

As a means to find if a word is good English I installed pyspellchecker.

pip install pyspellchecker

I don't know if this is the best choice. It hinders me that method "word_usage_frequency()" is not well explained. By trial end error I found that not recognized words return 0.0 and others return a (small) float. So I used this method.

>>> spell = SpellChecker()
>>> spell.word_usage_frequency("asked")
0.0003089325752071077
>>> spell.word_usage_frequency("askad")
0.0
>>> spell.word_usage_frequency("thus")
1.54534315113533e-05
>>> spell.word_usage_frequency("while")
0.0003726819106697944
>>> spell.word_usage_frequency("he")
0.007768271796283737
>>> spell.word_usage_frequency("it")
0.012121554730657262

Now my code.

from spellchecker import SpellChecker


class Probability:
    """This stores all probable solutions of a decoded line.

    Each line is tested for the occurrence of good English words.
    eat_line(encryption_key, line) stores a line and gives it a
    score for the existence of English words. 0 means: nothing
    recognized.
    best_result() returns a list of lines with the best score.
    all_results() returns a dictionary with all lines.  Intended
    for debugging.
    """
    def __init__(self) -> None:
        """Initializes the structures to contain all probable solutions.

        all_sol is a dictionary. The key is the average score denoting
        if the words in the line are good English. 0.0 means: no English
        recognized. Higher scores means more words recognized.  The values
        in the dictionary is a list. Each item of this list is again a list
        with two items: item[0] is the encryption_key, item[1] is the line.
        line_score_list is a list of scores. The same as the keys of
        dictionary all_sol.
        spell is the SpellChecker object.
        """
        self.all_sol = {}
        self.line_score_list = []
        self.spell = SpellChecker()

    def _avg(self, iterable) -> float:
        """_avg computes the average of a list (or any iterable) of numbers. """
        return sum(iterable) / len(iterable)

    def eat_line(self, enc_key: int, line: str) -> None:
        """Consume a line of (decrypted) text.

        enc_key: the encryption key used.
        line: the line of text.
        The probability of the line being good English text is computed
        as a score. This score is used as a key to store the key and line in
        a dictionary.
        """
        # word_score_list is the score for each word in the line.
        word_score_list = []
        wordlist = self.spell.split_words(line)
        for word in wordlist:
            # Ignore words consisting of 1 letter.
            if len(word) > 1:
                word_score_list.append(self.spell.word_usage_frequency(word.lower()))
        avg_score = self._avg(word_score_list)
        self.line_score_list.append(avg_score)
        if avg_score in self.all_sol:
            self.all_sol[avg_score].append([enc_key, line])
        else:
            # Note: add a list of lists.
            self.all_sol[avg_score] = [[enc_key, line]]

    def best_result(self) -> list:
        """When all permutations are "eaten", the best fit is returned.

        More than one line may have the same score, so a list is
        returned (hopefully a list with one item).  Each item is
        a pair of encryption_key and line.
        """
        best_score = max(self.line_score_list)
        return self.all_sol[best_score]

    def all_result(self) -> dict:
        """For debugging purposes: returns the complete dictionary.  """
        return self.all_sol


yoriz_list = [
    "Svr! zw!3Nz0Nsv  v9N yr5N!x3BQ",
    "XC952v2?P2!Pux??x P?1t7P26952v2?S",
    "HXbeaTzXhzQTiiTgziWPczRdbeaTm3",
    "tRPSOHanLVnEHWWHUnWKDQnFRPSOLFDWHGq",
    "AgVo6dn6WZooZm6ocVi6iZnoZY9",
    "xG6IJ0b.Jb70KK0IbK?6Eb90EJ0e",
    "LYUXUVcfcns5Wiohnm8",
    "uD75 3.Y53G7GY3F7B'HYGD75 3.Y7BCI90YHCY4F73?YH07YFI.7Gb",
    "mJRFMSEFiNP?ARGA?JGRWi.C?RQiNSPGRWl",
    "cBB?BCUC6?E02U!3F3BU.yCCUC703!D0IX",
    "o86y??QyD063w3.6EQ?36y8wyxT",
    "qHeNB?e.9 ?eI.e9G0CAOCNS,eL?.OM?eNB?eN?GJN9NCIHeNIeAO?MMh",
    "DRObOucRYeVNuLOuYXO--uKXNuZbOPObKLViuYXViuYXOu--YLfSYecugKiudYuNYuSdx",
    "3ZhVciUVyhVOhykOmyaOmybchyPSycPjWcigyOhyTWfghyibZSggymci'fSy6ihQV2",
    "i0DR4.RwzAAz?RA3v9R9zCz?U",
    "Z E7.F67V?4G4CV8DV.5E4?V14EE4CVE7z?V*C867E*V?.HY",
    "d1RA3zR48 7z8z9AvA409R4.R3v?yRA0RzE 7v49,R4A'.RvRwvyR4yzvU",
    "7PudROuSWZVOWOXdKdSYXuScuOKciudYuOhZVKSX,uSduWKiuLOuKuQYYNuSNOKx",
    "Zmyq52moq5Im4qI1zqIt1zwuzsIs4qm6IupqmI--Ixq6'5Ip1Iy14qI1rI6t15qJ",
]

SYMBOLS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890 !?.'

print("KEY  TEXT")
for i in yoriz_list:
    prob = Probability()
    for key in range(len(SYMBOLS)):
        translated = ''
        for symbol in i:
            if symbol in SYMBOLS:
                symbolIndex = SYMBOLS.find(symbol)
                translatedIndex = symbolIndex - key

                if translatedIndex < 0:
                    translatedIndex = translatedIndex + len(SYMBOLS)
                translated = translated + SYMBOLS[translatedIndex]
            else:
                translated = translated + symbol
        # Store the result in the prob object.
        prob.eat_line(key, translated)
    # Print all lines with the highest score.
    for res in prob.best_result():
        print(f"{res[0]:3d}  {res[1]}")

... and here are @Yoriz 's rules:

Output:KEY  TEXT
 17  Beautiful is better than ugly.
 19  Explicit is better than implicit.
 55  Simple is better than complex.
 43  Complex is better than complicated.
 61  Flat is better than nested.
 31  Sparse is better than dense.
 60  Readability counts.
 28  Special cases aren't special enough to break the rules.
 38  Although practicality beats purity.
 24  Errors should never pass silently.
 20  Unless explicitly silenced.
 34  In the face of ambiguity, refuse the temptation to guess.
 50  There should be one-- and preferably only one --obvious way to do it.
 54  Although that way may not be obvious at first unless you're Dutch.
 21  Now is better than never.
 25  Although never is often better than *right* now.
 21  If the implementation is hard to explain, it's a bad idea.
 50  If the implementation is easy to explain, it may be a good idea.
 12  Namespaces are one honking great idea -- let's do more of those!

I hope you find this useful, Truman. As stated before: it is a bit fuzzy and some tweaking may be necessary. Like that I don't count one-letter words because I found they result in relative high scores.

***Yoriz*** · Aug-18-2021, 05:12 PM

@ibreeden that's it i should have included

Quote:The Zen of Python, by Tim Peters

from

import this

DPaul · Aug-19-2021, 02:47 PM

I recognize that some very clever code has been proposed, but it is hard to see how this fits in with a starter course.
I hope TS will eventually disclose what he was supposed to do, when the 'teacher?' reveals the solution.

I was also challenged to provide a "better" solution, I have none that qualifies as better.
I did however think of a way that the initial "brute force" approach could be limited to much less lines.
This is, knowing that it is normal text, and several different lines, but the language does not matter.

As it is text, there should be a 'space' every so many symbols.
As you have the list of symbols, you can hardcode the index of 'space'.
Then with a system of "answer = input('What do you think the space char is for this line...?")",
you can try from left to right the coded letters, calculate the shift, and very soon, bingo, and you go to the next line.
Works like a charm. It qualifies as "less brute force"

Paul

Truman · (This post was last modified: Aug-21-2021, 03:29 PM by Truman.)

I started working on another problem in the meantime, will get back to this eventually but not sure when. I apologize if disappointed anyone.
And I'm following a book, not the real teacher.

DPaul · (This post was last modified: Aug-22-2021, 09:16 AM by DPaul.)

Going back to the original question by TS, this is an
attempt to reduce the number of possibilities generated by the "brute force" method.
It assumes plain text, so likely there are spaces between the words.
I believe it will solve all Cesars, generated with this key in any language (that uses these symbols).
I have replaced the symbols not found by '?' and changed the indexing system so i could do try-except.
Next person to do a Cesar should use the alternate symbols list, that I was able to format correctly
with the help of other posters here. Smile

Paul

# import string

thelist = ["qeFIP?eGSeECNNS,", "5coOMXXcoPSZIWoQI,",
           "avnl1olyD4l'ylDohww6DhzDjhuDil,",
           "z.GM?.cEQc. 70c.7KcKMKHA9AGFK,", "?MFYp2pPJJUpZSIJWpRdpMFY,",
           "ZqH8sl5HtqHTH4s3lyvH5zH5spH4t pHzqHlH3l5K",
           "Zfbi,!tif!xpvme!qspcbcmz!fbu!nfA"]

# SYMBOLS = f"{string.ascii_letters}{string.digits}{string.punctuation} " # add space
SYMBOLS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890 !?.'

def newindex(letter, shft):
    try:
        idxnew =  SYMBOLS.index(letter) + shft
        if idxnew >= len(SYMBOLS):
            idxnew = idxnew  - len(SYMBOLS) 
        return SYMBOLS[idxnew]
    except:
        return '?'
    
idxspace = SYMBOLS.index(' ')
textline = ''

for line in thelist:
    letterIndex = 0
    IcanReadIt = False
    while not IcanReadIt:
        try:
            shft = idxspace - SYMBOLS.index(line[letterIndex])
            for letter in line:
                textline += newindex(letter, shft)
            print(f'> > > > > > > > > > > > >  {textline}')
            answ = input('Can you read this ? (y/n) ')
            if answ.lower() == 'y':
                IcanReadIt = True
            textline = ''
        except:
            textline += '?'
        letterIndex += 1

***Yoriz*** · (This post was last modified: Aug-23-2021, 06:02 PM by Yoriz.)

I added a solver that creates a score based on the number of spaces and lower case vowels, I don't know if it works in all cases, it works on the 3 ciphers so far.

import random
from dataclasses import dataclass, field
from typing import Generator, List

SYMBOLS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890 !?."


thelist = [
    "qeFIP?eGSeECNNS,",
    "5coOMXXcoPSZIWoQI,",
    "avnl1olyD4l'ylDohww6DhzDjhuDil,",
    "z.GM?.cEQc. 70c.7KcKMKHA9AGFK,",
    "?MFYp2pPJJUpZSIJWpRdpMFY,",
    "ZqH8sl5HtqHTH4s3lyvH5zH5spH4t pHzqHlH3l5K",
    "Zfbi,!tif!xpvme!qspcbcmz!fbu!nfA",
]


@dataclass(order=True)
class CipherData:
    shift: str = field(compare=False)
    result: str = field(compare=False)
    score: float = field(default=0.0, compare=True)


@dataclass
class CipherLine:
    line: str
    cipher_data_possibilites: List[CipherData]

    def max_score_line(self):
        return max(self.cipher_data_possibilites)

    def top_3_score_lines(self):
        return sorted(self.cipher_data_possibilites, reverse=True)[:3]


@dataclass
class CeaserCipher:
    symbols: str

    def _shift_symbols_right(self, shift: int) -> str:
        return self.symbols[shift:] + self.symbols[:shift]

    def _shift_symbols_left(self, shift: int) -> str:
        shift = len(self.symbols) - shift
        return self.symbols[shift:] + self.symbols[:shift]

    def cipher(self, line: str, shift: int):
        shifted_sybols = self._shift_symbols_right(shift)
        result = line.translate(line.maketrans(self.symbols, shifted_sybols))
        return CipherData(shift, result)

    def decipher(self, line: str, shift: int) -> str:
        shifted_sybols = self._shift_symbols_left(shift)
        result = line.translate(line.maketrans(self.symbols, shifted_sybols))
        return CipherData(shift, result)

    def decipher_all_possibilties(self, line: str) -> Generator[CipherData, None, None]:
        for shift, _ in enumerate(self.symbols):
            yield self.decipher(line, shift)

    def cipher_random_shift(self, line: str) -> str:
        return self.cipher(line, random.randint(0, len(self.symbols)))


def ceaser_cipher_solver(line: str) -> CipherLine:
    cipherer = CeaserCipher(SYMBOLS)
    cipher_lines = []
    for cipher_data in cipherer.decipher_all_possibilties(line):
        cipher_data.score = float(cipher_data.result.count(" ")) or -10.0
        cipher_data.score += sum(cipher_data.result.count(item) for item in ("aeiou"))
        cipher_lines.append(cipher_data)
    return CipherLine(line, cipher_lines)


for line in thelist:
    cipher_line = ceaser_cipher_solver(line)
    print(cipher_line.max_score_line())

Output:CipherData(shift=34, result='I love my kitty,', score=6.0)
CipherData(shift=44, result='My kitty loves me,', score=7.0)
CipherData(shift=7, result="Together we're happy as can be,", score=14.0)
CipherData(shift=32, result='Though my head has suspicions,', score=13.0)
CipherData(shift=45, result='That I keep under my hat,', score=11.0)
CipherData(shift=11, result='Of what if I shrank to the size of a rat.', score=20.0)
CipherData(shift=1, result='Yeah, she would probably eat me.', score=15.0)

Edit: added from typing import List so it works with pre 3.9 python versions

ibreeden · (This post was last modified: Aug-23-2021, 09:30 AM by ibreeden.)

Hi @DPaul ,
I like your solution. It took me a while to understand the algorithm but it is simple and efficient!
You just assume the first character is a space, and from that assumption you decypher the whole line. If the user rejects the result, you assume the second character to be a space and so on. This way you find the solution after all letters of the first word have been tried. Nice! I learned from that.

Now I am trying to understand the solution of Yoriz. His program looks very neat but some comment about the algorithm would have been helpful. He believes the code is self-explanatory as good code should be. The names of the classes (CipherData, CipherLine, CeaserCipher) indeed give a good impression of what it is all about, but it does not immediately make clear to me how the whole algorithm works. But never mind, I want to learn and hope to have it understood by tomorrow.

DPaul · Aug-23-2021, 04:00 PM

Hi ibreeden,

I would like to study Yoriz' solution too,
but both version 1 and 2 give a "TypeError: 'type' object is not subscriptable".
Do you get it working?

Paul

A deciphering problem

User Panel Messages

Announcements