Python Forum
Python BeautifulSoup gives unusable text? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Python BeautifulSoup gives unusable text? (/thread-35403.html)



Python BeautifulSoup gives unusable text? - dggo666 - Oct-29-2021

Hello, everyone
I apologize for my english.
I have a Python script that extracts the complete text from a domain and every single subdomain. So then I have practically the entire text of the website. It also works without any problems, but every time I get strange characters and emoji's. Does anyone know how to filter out this text. Because I tried several times with BeautifulSoup to ignore this text, but it didn't work.
For example:

bytes(text, 'utf-8').decode('utf-8','ignore')
My full script is attached.[attachment=1350]