Python BeautifulSoup gives unusable text? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Python BeautifulSoup gives unusable text? (/thread-35403.html) |
Python BeautifulSoup gives unusable text? - dggo666 - Oct-29-2021 Hello, everyone I apologize for my english. I have a Python script that extracts the complete text from a domain and every single subdomain. So then I have practically the entire text of the website. It also works without any problems, but every time I get strange characters and emoji's. Does anyone know how to filter out this text. Because I tried several times with BeautifulSoup to ignore this text, but it didn't work. For example: bytes(text, 'utf-8').decode('utf-8','ignore')My full script is attached.[attachment=1350] |