Posts: 74
Threads: 28
Joined: Feb 2020
Hi everybody
Is there a way to collate ancient greek characters without having to transcode them to a single character. I mean to compute a string where diacritics are removed, so as to replace for instance ά, ἁ, ἀ, ᾳ, … by α ?
Arbiel
Posts: 4,786
Threads: 76
Joined: Jan 2018
Feb-28-2020, 06:53 PM
(This post was last modified: Feb-28-2020, 07:49 PM by Gribouillis.)
You could perhaps use str.translate()
>>> table = str.maketrans("άἁἀᾳ", "α" * 4)
>>> s = "ά, ἁ, ἀ, ᾳ"
>>> s.translate(table)
'α, α, α, α'
Posts: 7,315
Threads: 123
Joined: Sep 2016
Unidecode is good,an could work for this.
Sample texts ancient Greek
>>> from unidecode import unidecode
>>>
>>> s = 'Μῆνιν ἄειδε, θεά, Πηληϊάδεω Ἀχιλῆο'
>>> unidecode(s)
'Menin aeide, thea, Peleiadeo Akhileo'
>>> s1 = 'Ἀτρεΐδης τε ἄναξ ἀνδρῶν καὶ δῖος Ἀχιλλεύς'
>>> unidecode(s1)
'AtreIdes te anax andron kai dios Akhilleus'
Posts: 74
Threads: 28
Joined: Feb 2020
Hi snippsat
Thank you for your input. However, it is not quite what I am looking for. As you can see, «o» and «ω» both get «o», «ε» and «η», «e». Sorting the result of «unicode(string)» won't provide the result I want.
I'll read the doc concerning unicode, to see if there is something which corresponds to my need.
Otherwise, I'll have to procede as Gribouillis suggests, with the drawback of having to script all possible translations.
Arbiel
Posts: 74
Threads: 28
Joined: Feb 2020
Hi
I did not make is to install unicode
Quote:python -m pip install unicode
Defaulting to user installation because normal site-packages is not writeable
Collecting unicode
Downloading unicode-2.7-py2.py3-none-any.whl (14 kB)
Installing collected packages: unicode
Successfully installed unicode-2.7
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import unicode Error: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'unicode'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'unicode'
Quote:pip install Unidecode
Defaulting to user installation because normal site-packages is not writeable
Collecting Unidecode
Downloading Unidecode-1.1.1-py2.py3-none-any.whl (238 kB)
|████████████████████████████████| 238 kB 510 kB/s
Installing collected packages: Unidecode
Successfully installed Unidecode-1.1.1
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicode Error: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'unicode'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'unicode'
So what should I do ?
Arbiel
Posts: 4,786
Threads: 76
Joined: Jan 2018
Mar-27-2020, 09:07 PM
(This post was last modified: Mar-27-2020, 09:08 PM by Gribouillis.)
There are two errors in your aproach
- The name of the module is not unicode but unidecode.
- If you are in Linux and your default python is python 2.7, then don't use it! use the python3 command instead.
So, install unidecode with
Output: python3 -m pip install --user unidecode
Then use it in python3
from unidecode import unidecode
Posts: 74
Threads: 28
Joined: Feb 2020
Hi Gribouillis
Thank you for your help, which has been very valuable
Quote:python3 -m pip install --user unidecode
Collecting unidecode
Cache entry deserialization failed, entry ignored
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/...ne-any.whl (238kB)
100% |████████████████████████████████| 245kB 1.3MB/s
Installing collected packages: unidecode
Successfully installed unidecode-1.1.1
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unidecode
>>> Arbiel
Posts: 74
Threads: 28
Joined: Feb 2020
Mar-29-2020, 05:38 PM
(This post was last modified: Mar-29-2020, 05:38 PM by arbiel.)
Hi
Forget about this post. I have been wrong as I coded
import unidecode instead of
from unidecode import unidecode or, with "import unidecode", I should have coded
unidecode.unidecode("Κνωσός").encode("ascii") Apparently, I made a mistake somewhere
python
Python 3.8.0 (default, Oct 28 2019, 16:14:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import unidecode
>>> print(unidecode.__doc__)
Transliterate Unicode text into plain 7-bit ASCII.
Example usage:
>>> from unidecode import unidecode
>>> unidecode(u"北亰")
"Bei Jing "
The transliteration uses a straightforward map, and doesn't have alternatives
for the same character based on language, position, or anything else.
In Python 3, a standard string object will be returned. If you need bytes, use:
>>> unidecode("Κνωσός").encode("ascii")
b'Knosos'
>>> unidecode("Κνωσός").encode("ascii") Error: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable
I suspected the module unicode to be responsible for this failure. So I did
Quote:pip uninstall unicode
Found existing installation: unicode 2.7
Uninstalling unicode-2.7:
Would remove:
/home/remi/.local/bin/paracode
/home/remi/.local/bin/unicode
/home/remi/.local/lib/python3.8/site-packages/unicode-2.7.dist-info/*
Proceed (y/n)? n
Are those files, python files?
Would I make a blunder if I remove them ?
Should I uninstall unidecode and re-install it ?
Arbiel
Posts: 7,315
Threads: 123
Joined: Sep 2016
(Mar-29-2020, 05:38 PM)arbiel Wrote: Are those files, python files?
Would I make a blunder if I remove them ?
Should I uninstall unidecode and re-install it ? Your installation is okay.
import unicode this is not a module,so it will just give error message.
You can try to update apt_pkg to avoid that message No module named 'apt_pkg' .
sudo apt-get install --reinstall python3-apt
|