![]() |
encoding issiue using requests - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: encoding issiue using requests (/thread-7695.html) |
encoding issiue using requests - dmbest - Jan-21-2018 Hi all, I'm trying to get info from site with following code. I'm getting this: ['Gesellschafter/in', u'Vorsitzende/r der Gesch\xe4ftsf\xfchrung'] and expect it to be: ['Gesellschafter/in', u'Vorsitzende/r der Geschäftsführung'] import requests from lxml import html CHKURL = "http://www.monetas.ch/htm/653/de/Aktuelles-Management.htm?subj=2519858" XPATH = ".//*[@id='content']/table/tbody/tr/td[2]//text()" def urlparse(url): url = url.strip() response = requests.get(url) parsed = html.fromstring(response.text) return parsed xp = urlparse(CHKURL).xpath(XPATH) print xpwhere am I wrong? thx in advance RE: encoding issiue using requests - snippsat - Jan-21-2018 Nothing is wrong,is the way Python 2 handle Unicode in a list. print and it magically work. >>> lst = ['Gesellschafter/in', u'Vorsitzende/r der Gesch\xe4ftsf\xfchrung'] >>> print(lst[1]) Vorsitzende/r der GeschäftsführungPython 3 has big changes in Unicode,and you should use Python 3 not 2. So in Python 3 output look like this.
RE: encoding issiue using requests - dmbest - Jan-21-2018 When I print it works, but when I write result to CSV file it's a mess again. PS: Thanks for quick reply! I'll switch to P3 in near future. But for current project need to use P2. RE: encoding issiue using requests - snippsat - Jan-21-2018 (Jan-21-2018, 04:39 PM)dmbest Wrote: When I print it works, but when I write result to CSV file it's a mess again.Always try to use utf-8 in and out when working with files.Example Python 2.7: # -*- coding: utf-8 -*- import io lst = ['Gesellschafter/in', u'Vorsitzende/r der Gesch\xe4ftsf\xfchrung'] with io.open('out.csv', 'w', encoding='utf-8') as f: f.write(', '.join(lst))
RE: encoding issiue using requests - DeaD_EyE - Jan-22-2018 Don't use Python 2.7 Use Python 3.6+ Also the sourcecode is decoded as UTF8 by default with Python 3: lst = ['Gesellschafter/in', 'Vorsitzende/r der Geschäftsführung'] Side-Effect: You can even use german variable names: verhör = 42 # or chinese? 谢谢 = 'Danke'If you open files in text mode, the default encoding is UTF-8. As described before, you can define the encoding of text. Sometimes there are other encodings used like: latin1, cp850, etc. You'll find very often csv-files with encodings other than utf-8. If you don't know an encoding and hate guessing, you should look for this module: https://ftfy.readthedocs.io/en/latest/ RE: encoding issiue using requests - snippsat - Jan-22-2018 (Jan-22-2018, 12:04 AM)DeaD_EyE Wrote: Don't use Python 2.7Read what he post. (Jan-21-2018, 04:39 PM)dmbest Wrote: But for current project need to use P2. |