Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
html to text problem
#5
(Apr-27-2018, 07:55 PM)Kyle Wrote: Any other ideas?
Have to look url address or you have post the raw html and output you want out.

Think of why you can just parse the normal way.
Example a <p> tag with text you want,then just parse what's inside <p>.
from bs4 import BeautifulSoup

html = '''\
<!DOCTYPE html>
<html>
  <head>
    <title>HTML p Tag</title>
  </head>
  <body>
    <p>This paragraph is defined using the HTML p<br />
       A new line<br />
       Another new line<br />
    </p>
  </body>
</html>'''
soup = BeautifulSoup(html, 'lxml')
Test:
>>> p = soup.find('p')
>>> p
<p>This paragraph is defined using the HTML p<br/>
      A new line<br/>
      Another new line<br/>
</p>

>>> # Using text br will be \n
>>> p = soup.find('p').text
>>> p
('This paragraph is defined using the HTML p\n'
 '      A new line\n'
 '      Another new line\n')

>>> print(p)
This paragraph is defined using the HTML p
      A new line
      Another new line

>>> # Can clean a little more
>>> for line in p.split('\n'):
...     print(line.lstrip())
     
This paragraph is defined using the HTML p
A new line
Another new line
Reply


Messages In This Thread
html to text problem - by Kyle - Apr-27-2018, 05:15 PM
RE: html to text problem - by snippsat - Apr-27-2018, 06:15 PM
RE: html to text problem - by Kyle - Apr-27-2018, 07:55 PM
RE: html to text problem - by nilamo - Apr-27-2018, 08:27 PM
RE: html to text problem - by snippsat - Apr-27-2018, 09:02 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,684 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,508 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,391 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 3,428 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Help on parsing simple text on HTML amaumox 5 3,527 Jan-03-2020, 05:50 PM
Last Post: amaumox
  Extract text between bold headlines from HTML CostasG 1 2,353 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Getting a specific text inside an html with soup mathieugrimbert 9 16,003 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 4,755 Oct-08-2018, 01:43 PM
Last Post: pitonas
  Decoding html to text string PeterPython 1 2,669 Aug-12-2018, 07:23 PM
Last Post: Larz60+
  Problem parsing website html file thefpgarace 2 3,227 May-01-2018, 11:09 AM
Last Post: Standard_user

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020