Apr-27-2018, 05:15 PM
I am very new to python and this forum. I apologize for any newbie mistakes.
I am trying to convert html to text. When I do this, it is moving everything after the first tab to a new line.
For example, instead of:
****************************
November 5, 2008
****************************
it saves"
****************************
November 5,
2008
****************************
The current code is:
Any help is much appreciated. Thanks in advance.
I am trying to convert html to text. When I do this, it is moving everything after the first tab to a new line.
For example, instead of:
****************************
November 5, 2008
****************************
it saves"
****************************
November 5,
2008
****************************
The current code is:
soup=BeautifulSoup(download_target.text, 'html.parser') f_text=soup.get_text() text_file = open(file_loc+"\\"+url_rename[2]+"\\"+url_rename[3]+"\\"+url_rename[1]+".txt","w") text_file.write(str(f_text.encode('ascii', errors='ignore')).replace("\n", "\r\n").replace("\\n", "\r\n").replace("\\t", ""))I think the fix has to do with:
.join(f_text.splitlines())But when I use "f_text" in the splitlines command, I get an error.
text_file.write(str(f_text.encode('ascii', errors='ignore'))).replace("\\n", "\r\n").replace("\n", "\r\n").replace("\\t", "\t").join(f_text.splitlines())AttributeError: 'int' object has no attribute 'replace'
Any help is much appreciated. Thanks in advance.