Hello,
I can't get Python to replace a string in a string: If I read the HTML file as binary, it fails; If I open it as text, it fails too :-/
Thank you.
---
Edit: Since it's no possible to delete a thread, I'll just add the answer… which was simple enough: Read the file as text telling Python which encoding to use (Windows=Latin1 by default), which BeautifulSoup reads fine (doesn't need to be bytes)
I can't get Python to replace a string in a string: If I read the HTML file as binary, it fails; If I open it as text, it fails too :-/
#UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 14167: character maps to <undefined> #with open(INPUTFILE, 'r') as f: with open(INPUTFILE, 'rb') as f: content_text = f.read() #TypeError: a bytes-like object is required, not 'str' #content_text.replace("<i>","[i]") content_text = str(content_text) content_text.replace("<i>","[i]") #for some reason, no matter the input (string or bytes), BS will just output first line content_text = content_text.encode(encoding='UTF-8') soup = BeautifulSoup(content_text, 'xml') #<?xml version="1.0" encoding="utf-8"?>, and then stops print(soup.prettify())What's the right way to 1) replace a string in a string, and then 2) have Beautiful Soup parse the input successfully?
Thank you.
---
Edit: Since it's no possible to delete a thread, I'll just add the answer… which was simple enough: Read the file as text telling Python which encoding to use (Windows=Latin1 by default), which BeautifulSoup reads fine (doesn't need to be bytes)
with open(INPUTFILE, 'r',encoding='utf-8') as f: content_text = f.read() content_text.replace("<i>","[i]") soup = BS(content_text, 'xml') print(soup.prettify())