Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Collating ancient greek
#1
Hi everybody

Is there a way to collate ancient greek characters without having to transcode them to a single character. I mean to compute a string where diacritics are removed, so as to replace for instance ά, ἁ, ἀ, ᾳ, … by α ?

Arbiel
Reply
#2
You could perhaps use str.translate()
>>> table = str.maketrans("άἁἀᾳ", "α" * 4)
>>> s = "ά, ἁ, ἀ, ᾳ"
>>> s.translate(table)
'α, α, α, α'
Reply
#3
Unidecode is good,an could work for this.
Sample texts ancient Greek
>>> from unidecode import unidecode
>>> 
>>> s = 'Μῆνιν ἄειδε, θεά, Πηληϊάδεω Ἀχιλῆο'
>>> unidecode(s)
'Menin aeide, thea, Peleiadeo Akhileo'

>>> s1 = 'Ἀτρεΐδης τε ἄναξ ἀνδρῶν καὶ δῖος Ἀχιλλεύς'
>>> unidecode(s1)
'AtreIdes te anax andron kai dios Akhilleus'
Reply
#4
Hi snippsat

Thank you for your input. However, it is not quite what I am looking for. As you can see, «o» and «ω» both get «o», «ε» and «η», «e». Sorting the result of «unicode(string)» won't provide the result I want.

I'll read the doc concerning unicode, to see if there is something which corresponds to my need.

Otherwise, I'll have to procede as Gribouillis suggests, with the drawback of having to script all possible translations.

Arbiel
Reply
#5
Hi

I did not make is to install unicode
Quote:python -m pip install unicode
Defaulting to user installation because normal site-packages is not writeable
Collecting unicode
Downloading unicode-2.7-py2.py3-none-any.whl (14 kB)
Installing collected packages: unicode
Successfully installed unicode-2.7
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import unicode
Error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'unicode' Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module> from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module> import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module> from apport.packaging_impl import impl as packaging File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module> import apt File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module> import apt_pkg ModuleNotFoundError: No module named 'apt_pkg' Original exception was: Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'unicode'
Quote:pip install Unidecode
Defaulting to user installation because normal site-packages is not writeable
Collecting Unidecode
Downloading Unidecode-1.1.1-py2.py3-none-any.whl (238 kB)
|████████████████████████████████| 238 kB 510 kB/s
Installing collected packages: Unidecode
Successfully installed Unidecode-1.1.1
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicode
Error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'unicode' Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module> from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module> import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module> from apport.packaging_impl import impl as packaging File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module> import apt File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module> import apt_pkg ModuleNotFoundError: No module named 'apt_pkg' Original exception was: Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'unicode'
So what should I do ?

Arbiel
Reply
#6
There are two errors in your aproach
  1. The name of the module is not unicode but unidecode.
  2. If you are in Linux and your default python is python 2.7, then don't use it! use the python3 command instead.
So, install unidecode with
Output:
python3 -m pip install --user unidecode
Then use it in python3
from unidecode import unidecode
Reply
#7
Hi Gribouillis

Thank you for your help, which has been very valuable
Quote:python3 -m pip install --user unidecode
Collecting unidecode
Cache entry deserialization failed, entry ignored
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/...ne-any.whl (238kB)
100% |████████████████████████████████| 245kB 1.3MB/s
Installing collected packages: unidecode
Successfully installed unidecode-1.1.1
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unidecode
>>> 
Arbiel
Reply
#8
Hi

Forget about this post. I have been wrong as I coded
import unidecode
instead of
from unidecode import unidecode
or, with "import unidecode", I should have coded
unidecode.unidecode("Κνωσός").encode("ascii")
Apparently, I made a mistake somewhere
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import unidecode
>>> print(unidecode.__doc__)
Transliterate Unicode text into plain 7-bit ASCII.

Example usage:
>>> from unidecode import unidecode
>>> unidecode(u"北亰")
"Bei Jing "

The transliteration uses a straightforward map, and doesn't have alternatives
for the same character based on language, position, or anything else.

In Python 3, a standard string object will be returned. If you need bytes, use:
>>> unidecode("Κνωσός").encode("ascii")
b'Knosos'

>>> unidecode("Κνωσός").encode("ascii")
Error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'module' object is not callable Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module> from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module> import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module> from apport.packaging_impl import impl as packaging File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module> import apt File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module> import apt_pkg ModuleNotFoundError: No module named 'apt_pkg' Original exception was: Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'module' object is not callable
I suspected the module unicode to be responsible for this failure. So I did
Quote:pip uninstall unicode
Found existing installation: unicode 2.7
Uninstalling unicode-2.7:
Would remove:
/home/remi/.local/bin/paracode
/home/remi/.local/bin/unicode
/home/remi/.local/lib/python3.8/site-packages/unicode-2.7.dist-info/*
Proceed (y/n)? n

Are those files, python files?
Would I make a blunder if I remove them ?
Should I uninstall unidecode and re-install it ?

Arbiel
Reply
#9
(Mar-29-2020, 05:38 PM)arbiel Wrote: Are those files, python files?
Would I make a blunder if I remove them ?
Should I uninstall unidecode and re-install it ?
Your installation is okay.
import unicode this is not a module,so it will just give error message.
You can try to update apt_pkg to avoid that message No module named 'apt_pkg'.
sudo apt-get install --reinstall python3-apt
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Greek letters with .readline() and tkinter KinkgOfKeks 7 1,749 Mar-24-2023, 05:13 PM
Last Post: deanhystad
  How to modernise an ancient code? bkpsusmitaa 16 7,042 Oct-01-2018, 02:47 AM
Last Post: bkpsusmitaa
  Can I upload a new version without previously deleting ancient version sylas 6 4,250 Nov-08-2017, 03:26 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020