Use BS-4 is better in most parts and Unicode support is very good.
I never use parser in standard library,same with urllib use Requests.
The standard library has strong modules that has a more stable platform and do not need so much changing,
but with parser and HTTP stuff is better to use modules that keep up with the rabbit changing of web.
I never use parser in standard library,same with urllib use Requests.
The standard library has strong modules that has a more stable platform and do not need so much changing,
but with parser and HTTP stuff is better to use modules that keep up with the rabbit changing of web.
doc Wrote:Beautiful Soup uses a sub-library called Unicode, Dammit to detect a document’s encoding and convert it to UnicodeSo as i just did post a answer here,can use that code and do some Unicode stuff.
When you write out a document from Beautiful Soup, you get aUTF-8
document, even if the document wasn’t in UTF-8 to begin with
from bs4 import BeautifulSoup xml = '''\ <provider> <identity>chess king♟♜♞</identity> <endpoint>some point.com</endpoint> </provider>''' soup = BeautifulSoup(xml, 'xml')
>> result = soup.find('identity') >>> result <identity>chess king♟♜♞</identity> >>> result.string.replace_with("testç") 'chess king♟♜♞' >>> soup <?xml version="1.0" encoding="utf-8"?> <provider> <identity>testç</identity> <endpoint>some point.com</endpoint> </provider> >>> result = soup.find('identity') >>> result <identity>testç</identity> >>> result.string = '♟♜♞' >>> soup <?xml version="1.0" encoding="utf-8"?> <provider> <identity>♟♜♞</identity> <endpoint>some point.com</endpoint> </provider>