unicode - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Forum & Off Topic (https://python-forum.io/forum-23.html) +--- Forum: Bar (https://python-forum.io/forum-27.html) +--- Thread: unicode (/thread-6301.html) |
unicode - Skaperen - Nov-15-2017 the range of ASCII characters that are "printable" is 32 to 126, or 33 to 126, inclusive, depending on whether blank spaces are considered "printable" (they are at least safe to try to print). i am looking for a list of ranges of "printable" characters in unicode. i am making some code to dump binary in a "readable" form, showing both the binary code in hexadecimal as well as the character if is a printable one else a '.' in place of each byte. i have created one for ASCII in C (and there are many others around, in pretty much every languge, i'm sure). i want to create one in python3 that includes support for UTF-8. that is, wherever it finds printable byte code combinations, it will output the character in the place where the character goes, depending on the dump style/format. here is an example from the one i made (in 3 different widths) modeled after a common IBM mainframe dump style:
RE: unicode - stranac - Nov-15-2017 Don't know if there are exact ranges, but take a look at https://docs.python.org/3/library/unicodedata.html#unicodedata.category and https://en.wikipedia.org/wiki/Template:General_Category_(Unicode) RE: unicode - sparkz_alot - Nov-15-2017 (Nov-15-2017, 08:49 AM)stranac Wrote: i am looking for a list of ranges of "printable" characters in unicode. They are all 'printable' to a certain extent. Not all code pairs are assigned and there is no font (that I am aware of) that supports all 1114111 utf-8 characters. That said, these are the code "planes" Two years ago I wrote this code to see what I could 'print'. import codecs # """ A program to print all unicode glyphs to a file """ # file = codecs.open("unicode_symbols_v2.txt", "w", "utf-8") # # This excludes range 55296-57543, which are surrogate pairs for UTF-16 # for a in range(0, 1114111): # if chr(a) == chr(0xfffd): # a +=a # if a >= 55296 and a <= 57543: if 55296 <= a <= 57543: a += a else: file.write('Decimal: ') file.write(str(a)) file.write(' Hex: ') file.write(str(hex(a))) file.write(' Binary: ') file.write(str(bin(a))) file.write(' Character: ') file.write(str(chr(a))) file.write("\n") a += a file.close()I should update it, since the changes of 3.6, but maybe later. This results in a rather large text file (~85,000 kb). Maybe you can glean something useful from it. RE: unicode - Skaperen - Nov-16-2017 i did try writing some code to try to output a unicode table, or at least a short form of it for codes up to U+07FF. the result was a mess on the screen, among quite many codes that printed something legible. i did this with a loop for code in range(0x0800): and encoded the result from chr(code) to UTF-8 and wrote each character, one at a time, directly to the terminal, with output around it to try to make a table structure. it was not pretty. so i am trying to see what more i can do. i don't know how complete the terminal program or the fonts it uses are but have gotten double-wide CJK characters many times. i will have to figure how how best to display many things in a dump output. the python unicode database that stranac referred me too looks like it would be useful. RE: unicode - sparkz_alot - Nov-16-2017 Seems you may always be one step behind . Python 3.6.3 supports Unicode Database 9, though the current is Unicode Database 10 (Unicode 10). Apparently, Python is expected to upgrade it's support for UDB 10 with Python 3.7 (What's new in 3.7), but by then, the UDB will probably be 11 or higher. So if you want to program those new emoji's, your just going to have to wait RE: unicode - Skaperen - Nov-17-2017 yeah, the Ubuntu repository tends to be slow. i wonder if they will ever get past 3.5.2. i wish PSF could build i386 and x86_64 packages of current versions in .deb and .rpm formats. RE: unicode - sparkz_alot - Nov-17-2017 Well, if you look at what was added to v10 (aside from the emoji's), it is some pretty obscure stuff, including extinct languages. I doubt there will be any great changes added to what already exists, so things like OS's and programming tools probably don't see it as something that needs immediate attention. Even considering emoji's, I think I've used maybe 5 out of all the ones available to us on this site (make that six) RE: unicode - Skaperen - Nov-18-2017 speaking of extinct languages. shouldn't perl be in that list, soon? |