Python Forum
Python BeautifulSoup gives unusable text?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python BeautifulSoup gives unusable text?
#1
Hello, everyone
I apologize for my english.
I have a Python script that extracts the complete text from a domain and every single subdomain. So then I have practically the entire text of the website. It also works without any problems, but every time I get strange characters and emoji's. Does anyone know how to filter out this text. Because I tried several times with BeautifulSoup to ignore this text, but it didn't work.
For example:

bytes(text, 'utf-8').decode('utf-8','ignore')
My full script is attached.
.py   webscraper.py (Size: 2.3 KB / Downloads: 200)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Strange ModuleNotFound Error on BeautifulSoup for Python 3.11 Gaberson19 1 991 Jul-13-2023, 10:38 AM
Last Post: Gaurav_Kumar
  BeautifulSoup returning text as N/A tantony 6 2,729 Sep-09-2021, 12:59 PM
Last Post: tantony
  Python BeautifulSoup IndexError: list index out of range rhat398 1 6,231 May-28-2021, 09:09 PM
Last Post: Daring_T
  Python 3.9 : BeautifulSoup: 'NoneType' object has no attribute 'text' fudgemasterultra 1 8,914 Mar-03-2021, 09:40 AM
Last Post: Larz60+
  Beautifulsoup doesn't scrape page (python 2.7) Hikki 0 1,983 Aug-01-2020, 05:54 PM
Last Post: Hikki
  Python beautifulsoup pagination error The61 5 3,460 Apr-09-2020, 09:17 PM
Last Post: Larz60+
  BeautifulSoup 'NoneType' object has no attribute 'text' bmccollum 9 14,641 Sep-14-2018, 12:56 PM
Last Post: bmccollum
  How to clean html content using BeautifulSoup in Python 3.6? PrateekG 5 10,347 Apr-27-2018, 01:14 PM
Last Post: snippsat
  Fixing - MIME type ("text/x-python") that is not "text/event-stream" mostek_6502 4 9,004 Sep-22-2017, 08:17 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020