May-14-2023, 06:54 AM
Hi,
It happens that some prayer cards are in poor condition or have poor print quality.
Amongst the zillions processed, they do not always stand out.
When examining the OCR text result, a word is sometimes
returned as "gibberish"; but you cannot explain to python what is gibberish and what not.
(Some people had "strange" names or lived in "strange" places)
Except: when a returned word has more than 5 the same letters in a row. Like : "ENTNTENSNNNMNNNINNNSNINNE".
In some languages (European) 4 the same are possible, but I don't think 5. Even with 4, I only know of one example.
Hence my question: how could I efficienty discover that a word has 5 identical letters (A-Z in capitals) in a row.
Yes i can do : lstLetters = ['AAAAA','BBBBB' ...] for letters in lstLetters: ... if letters in word ... etc.
Something faster and cleverer maybe ?
thx,
Paul
It happens that some prayer cards are in poor condition or have poor print quality.
Amongst the zillions processed, they do not always stand out.
When examining the OCR text result, a word is sometimes
returned as "gibberish"; but you cannot explain to python what is gibberish and what not.
(Some people had "strange" names or lived in "strange" places)
Except: when a returned word has more than 5 the same letters in a row. Like : "ENTNTENSNNNMNNNINNNSNINNE".
In some languages (European) 4 the same are possible, but I don't think 5. Even with 4, I only know of one example.
Hence my question: how could I efficienty discover that a word has 5 identical letters (A-Z in capitals) in a row.
Yes i can do : lstLetters = ['AAAAA','BBBBB' ...] for letters in lstLetters: ... if letters in word ... etc.
Something faster and cleverer maybe ?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.