Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unidecode issue
#1
Hi,
In some pdfs I encounter references to the original parish register, like so: ref = ' RP 477; p. 148 r° '
I perform unidecode on all strings in the document : fieldUni = unidecode.unidecode(field).upper()

This has never caused any problems, except in the above case, when i get this: ' RP 477; P. 148 RDEG '

The " ° " has been "translated" into DEG. That is not what is meant here.

How do I avoid this translation in python (other then a manual ctrl-H replace '°' with ... etc.) in the text document?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
(Sep-02-2023, 06:42 AM)DPaul Wrote: How do I avoid this translation in python (other then a manual ctrl-H replace '°' with ... etc.) in the text document?
Which translation do you want instead of replacing '°' with 'deg'
Reply
#3
(Sep-02-2023, 08:45 AM)Gribouillis Wrote: Which translation do you want instead
Fair question.
Let me do some research, because I have to find out if the 'degrees' symbol
was meant to be there and has some genealogy meaning.
Or is it a faulty translation of something earlier, if the original text was eg. in access of lotus 123..
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#4
(Sep-02-2023, 08:45 AM)Gribouillis Wrote: Which translation do you want instead
OK, there is a hidden meaning , only known to genealogists I suppose.
148 is the folio nr.
r° is recto , and...
v° means verso.
So, recto, verso would be the right translations.
I have checked the document, and indeed, some records are r°, others v°
?
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#5
Use re.sub() for example
>>> import re
>>> dic = {'r°': 'recto', 'v°': 'verso'}
>>> def repl(match):
...     return dic[match.group(0)]
... 
>>> s = ' RP 477; p. 148 r° '
>>> 
>>> re.sub('[rv]°', repl, s)
' RP 477; p. 148 recto '
Reply
#6
(Sep-03-2023, 06:20 PM)Gribouillis Wrote: Use re.sub() for example
I thought I had to fiddle around with unidecode parameters,
but this is nice and concise.
Thanks again,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020