Python Forum

Full Version: Python BeautifulSoup gives unusable text?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello, everyone
I apologize for my english.
I have a Python script that extracts the complete text from a domain and every single subdomain. So then I have practically the entire text of the website. It also works without any problems, but every time I get strange characters and emoji's. Does anyone know how to filter out this text. Because I tried several times with BeautifulSoup to ignore this text, but it didn't work.
For example:

bytes(text, 'utf-8').decode('utf-8','ignore')
My full script is attached.[attachment=1350]