Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
unicode symbol processing
#1
Hello,

While scarping web page I've faced problem of recognized unicode symbols.
Here is original string:
Output:
978-1-4419-5905-8
Here is how it looks in read page:
Output:
----
And here is output when I execute text[ind0:ind1]:
Output:
\uf641\uf63f\uf640-\uf6dc-\uf63c\uf63c\uf6dc\uf641-\uf63d\uf641\uf639\uf63d-\uf640
So I have couple of questions:
  1. How to detect that a particular fragment of text is not ASCII coded ?
  2. How to convert it in ASCII ?
Thanks.
Reply


Messages In This Thread
unicode symbol processing - by Pavel_47 - Dec-03-2019, 01:38 PM
RE: unicode symbol processing - by snippsat - Dec-03-2019, 03:28 PM
RE: unicode symbol processing - by Pavel_47 - Dec-03-2019, 07:33 PM
RE: unicode symbol processing - by DeaD_EyE - Dec-04-2019, 01:31 AM
RE: unicode symbol processing - by Pavel_47 - Dec-04-2019, 09:43 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020