Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Decoding html to text string
#1
Hi all

Trying to convert email information into text. Have the following code:

for response_part in data:
        if isinstance(response_part, tuple):
            msg = email.message_from_string(response_part[1].decode())
            email_subject = msg['subject']
            email_from = msg['from']
            email_date = msg['date']
            email_content = BeautifulSoup((str(response_part[1].decode())))
            print ('From : ' + email_from + '\n')
            print ('Subject : ' + email_subject + '\n')
            print (email_content)
I tried everything, but everytime I print the email_content variable I get the following (example):

splay:bl=' tatic.com="" ui="" v1="" width='3D"49"'/></td><td font-family:'open="" sans','arial',sans-s="erif;" style='3D"align:left;' vertical-align:bottom"=""><span style='3D"font-size:small"'>Happy emailing=
,<br/></span><span line-height:1"="" style='3D"font-size:x-large;'>The Gmail Tea=
m</span></td></tr></table></div>

<div 4%="" auto="" auto;="" border-radius:1em;="padding:1em;" font-family:'arial','helvetica',sans-s="erif;" font-size:0.8em;="" margin:0="" style='3D"direction:ltr;color:#777;' text-align:center;"="">=C2=A9 2017 Google Inc. 1600 Amphitheatre Parkway=
, Mountain View, CA 94043<br/></div>
How can I decode this into simple text string without all the formatting signs?

I tried to use both BeautifulSoup and html.unescape, can't get any of them working..

Kind Regards,
Peter
Reply
#2
can't vouch for it, but you can check out: https://github.com/KeepSafe/ks-email-parser
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,482 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Scrape for html based on url string and output into csv dana 13 5,297 Jan-13-2021, 03:52 PM
Last Post: snippsat
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,388 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Pandas tuple list returning html string shansaran 0 1,654 Mar-23-2020, 08:44 PM
Last Post: shansaran
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,316 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 3,331 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Help on parsing simple text on HTML amaumox 5 3,367 Jan-03-2020, 05:50 PM
Last Post: amaumox
  Extract text between bold headlines from HTML CostasG 1 2,247 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Getting a specific text inside an html with soup mathieugrimbert 9 15,763 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 4,621 Oct-08-2018, 01:43 PM
Last Post: pitonas

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020