html to text problem

***snippsat*** · (This post was last modified: Apr-27-2018, 09:02 PM by snippsat.)

(Apr-27-2018, 07:55 PM)Kyle Wrote: Any other ideas?

Have to look url address or you have post the raw html and output you want out.

Think of why you can just parse the normal way.
Example a <p> tag with text you want,then just parse what's inside <p>.

from bs4 import BeautifulSoup

html = '''\
<!DOCTYPE html>
<html>
  <head>
    <title>HTML p Tag</title>
  </head>
  <body>
    <p>This paragraph is defined using the HTML p<br />
       A new line<br />
       Another new line<br />
    </p>
  </body>
</html>'''
soup = BeautifulSoup(html, 'lxml')

Test:

>>> p = soup.find('p')
>>> p
<p>This paragraph is defined using the HTML p<br/>
      A new line<br/>
      Another new line<br/>
</p>

>>> # Using text br will be \n
>>> p = soup.find('p').text
>>> p
('This paragraph is defined using the HTML p\n'
 '      A new line\n'
 '      Another new line\n')

>>> print(p)
This paragraph is defined using the HTML p
      A new line
      Another new line

>>> # Can clean a little more
>>> for line in p.split('\n'):
...     print(line.lstrip())
     
This paragraph is defined using the HTML p
A new line
Another new line

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,684	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Any way to remove HTML tags from scraped data? (I want text only)	SeBz2020uk	1	3,508	Nov-02-2020, 08:12 PM Last Post: Larz60+
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,391	Mar-22-2020, 06:10 AM Last Post: BrandonKastning
	Web crawler extracting specific text from HTML	lewdow	1	3,428	Jan-03-2020, 11:21 PM Last Post: snippsat
	Help on parsing simple text on HTML	amaumox	5	3,527	Jan-03-2020, 05:50 PM Last Post: amaumox
	Extract text between bold headlines from HTML	CostasG	1	2,353	Aug-31-2019, 10:53 AM Last Post: snippsat
	Getting a specific text inside an html with soup	mathieugrimbert	9	16,003	Jul-10-2019, 12:40 PM Last Post: mathieugrimbert
	Beutifulsoup: how to pick text that's not in HTML tags?	pitonas	4	4,755	Oct-08-2018, 01:43 PM Last Post: pitonas
	Decoding html to text string	PeterPython	1	2,669	Aug-12-2018, 07:23 PM Last Post: Larz60+
	Problem parsing website html file	thefpgarace	2	3,227	May-01-2018, 11:09 AM Last Post: Standard_user

html to text problem

User Panel Messages

Announcements