(Apr-27-2018, 07:55 PM)Kyle Wrote: Any other ideas?Have to look url address or you have post the raw html and output you want out.
Think of why you can just parse the normal way.
Example a <p> tag with text you want,then just parse what's inside <p>.
from bs4 import BeautifulSoup html = '''\ <!DOCTYPE html> <html> <head> <title>HTML p Tag</title> </head> <body> <p>This paragraph is defined using the HTML p<br /> A new line<br /> Another new line<br /> </p> </body> </html>''' soup = BeautifulSoup(html, 'lxml')Test:
>>> p = soup.find('p') >>> p <p>This paragraph is defined using the HTML p<br/> A new line<br/> Another new line<br/> </p> >>> # Using text br will be \n >>> p = soup.find('p').text >>> p ('This paragraph is defined using the HTML p\n' ' A new line\n' ' Another new line\n') >>> print(p) This paragraph is defined using the HTML p A new line Another new line >>> # Can clean a little more >>> for line in p.split('\n'): ... print(line.lstrip()) This paragraph is defined using the HTML p A new line Another new line