Python Forum
encoding issiue using requests - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: encoding issiue using requests (/thread-7695.html)



encoding issiue using requests - dmbest - Jan-21-2018

Hi all,
I'm trying to get info from site with following code. I'm getting this:
['Gesellschafter/in', u'Vorsitzende/r der Gesch\xe4ftsf\xfchrung']
and expect it to be:
['Gesellschafter/in', u'Vorsitzende/r der Geschäftsführung']

import requests
from lxml import html

CHKURL	= "http://www.monetas.ch/htm/653/de/Aktuelles-Management.htm?subj=2519858"
XPATH	= ".//*[@id='content']/table/tbody/tr/td[2]//text()"

def urlparse(url):
	url = url.strip()
	response = requests.get(url)
	parsed = html.fromstring(response.text)
	return parsed
	
xp = urlparse(CHKURL).xpath(XPATH)
print xp
where am I wrong?

thx in advance


RE: encoding issiue using requests - snippsat - Jan-21-2018

Nothing is wrong,is the way Python 2 handle Unicode in a list.
print and it magically work.
>>> lst = ['Gesellschafter/in', u'Vorsitzende/r der Gesch\xe4ftsf\xfchrung']
>>> print(lst[1])
Vorsitzende/r der Geschäftsführung
Python 3 has big changes in Unicode,and you should use Python 3 not 2.
So in Python 3 output look like this.
Output:
['Gesellschafter/in', 'Vorsitzende/r der Geschäftsführung']



RE: encoding issiue using requests - dmbest - Jan-21-2018

When I print it works, but when I write result to CSV file it's a mess again.

PS: Thanks for quick reply! I'll switch to P3 in near future. But for current project need to use P2.


RE: encoding issiue using requests - snippsat - Jan-21-2018

(Jan-21-2018, 04:39 PM)dmbest Wrote: When I print it works, but when I write result to CSV file it's a mess again.
Always try to use utf-8 in and out when working with files.
Example Python 2.7:
# -*- coding: utf-8 -*-
import io

lst = ['Gesellschafter/in', u'Vorsitzende/r der Gesch\xe4ftsf\xfchrung']
with io.open('out.csv', 'w', encoding='utf-8') as f:
    f.write(', '.join(lst))
Output:
Gesellschafter/in, Vorsitzende/r der Geschäftsführung



RE: encoding issiue using requests - DeaD_EyE - Jan-22-2018

Don't use Python 2.7
Use Python 3.6+

Also the sourcecode is decoded as UTF8 by default with Python 3:
lst = ['Gesellschafter/in', 'Vorsitzende/r der Geschäftsführung']

Side-Effect: You can even use german variable names:
verhör = 42
# or chinese?
谢谢 = 'Danke'
If you open files in text mode, the default encoding is UTF-8. As described before, you can define the encoding of text.
Sometimes there are other encodings used like: latin1, cp850, etc.
You'll find very often csv-files with encodings other than utf-8.
If you don't know an encoding and hate guessing, you should look for this module: https://ftfy.readthedocs.io/en/latest/


RE: encoding issiue using requests - snippsat - Jan-22-2018

(Jan-22-2018, 12:04 AM)DeaD_EyE Wrote: Don't use Python 2.7
Read what he post.
(Jan-21-2018, 04:39 PM)dmbest Wrote: But for current project need to use P2.