Python Forum

Full Version: Decoding html to text string
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all

Trying to convert email information into text. Have the following code:

for response_part in data:
        if isinstance(response_part, tuple):
            msg = email.message_from_string(response_part[1].decode())
            email_subject = msg['subject']
            email_from = msg['from']
            email_date = msg['date']
            email_content = BeautifulSoup((str(response_part[1].decode())))
            print ('From : ' + email_from + '\n')
            print ('Subject : ' + email_subject + '\n')
            print (email_content)
I tried everything, but everytime I print the email_content variable I get the following (example):

splay:bl=' tatic.com="" ui="" v1="" width='3D"49"'/></td><td font-family:'open="" sans','arial',sans-s="erif;" style='3D"align:left;' vertical-align:bottom"=""><span style='3D"font-size:small"'>Happy emailing=
,<br/></span><span line-height:1"="" style='3D"font-size:x-large;'>The Gmail Tea=
m</span></td></tr></table></div>

<div 4%="" auto="" auto;="" border-radius:1em;="padding:1em;" font-family:'arial','helvetica',sans-s="erif;" font-size:0.8em;="" margin:0="" style='3D"direction:ltr;color:#777;' text-align:center;"="">=C2=A9 2017 Google Inc. 1600 Amphitheatre Parkway=
, Mountain View, CA 94043<br/></div>
How can I decode this into simple text string without all the formatting signs?

I tried to use both BeautifulSoup and html.unescape, can't get any of them working..

Kind Regards,
Peter
can't vouch for it, but you can check out: https://github.com/KeepSafe/ks-email-parser