Beutifulsoup: how to pick text that's not in HTML tags?

**Larz60+** · (This post was last modified: Oct-08-2018, 11:08 AM by Larz60+.)

If I knew the url of your site, I would have used it for example,
for this, I use https://www.weather.gov/

load the web site into chrome or firefox.
highlight the text you are interested in and right click
choose inspect element, move cursor in inspect over text node:

<strong>Travel date:</strong>&nbsp;2019.10.10<br>

right click --> copy --> XPath
paste into code like (your xpath will be dfferent):

xpath = '/html/body/div[5]/div/div[4]/p/a[2]'

Now run code like:

from lxml import html
import requests
import sys


def get_stuff():
    page = None
    response = requests.get('https://www.weather.gov/')
    if response.status_code == 200:
        page = response.content
    else:
        print("c'ant load page")
        sys.exit(-1)
    
    tree = tree = html.fromstring((page))
    # replace with your xpath
    node = tree.xpath('/html/body/div[4]/div[2]/div[1]/div[2]/div/div[2]/p')
    text = node[0].text.strip()
    print(text)


if __name__ == '__main__':
    get_stuff()

results:

Output:
A slow moving storm system will bring a continued threat for heavy snow over the Rockies, heavy rain, flooding,and severe weather over the Plains into midweek. Over the Gulf of Mexico, Tropical Storm Michael is expected tostrengthen into a hurricane and cause direct impacts to the northeast Gulf Coast by midweek. Heavy rain from Michael could once again impact the Carolinas late week.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python Obstacles \| Jeet-Kune-Do \| BS4 (Tags > MariaDB) [URL/Local HTML]	BrandonKastning	0	1,433	Feb-08-2022, 08:55 PM Last Post: BrandonKastning
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,682	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Any way to remove HTML tags from scraped data? (I want text only)	SeBz2020uk	1	3,504	Nov-02-2020, 08:12 PM Last Post: Larz60+
	Easy HTML Parser: Validating trs by attributes several tags deep?	runswithascript	7	3,624	Aug-14-2020, 10:58 PM Last Post: runswithascript
	Jinja2 HTML <a> tags not rendering properly	ChaitanyaPy	4	3,280	Jun-28-2020, 06:12 PM Last Post: ChaitanyaPy
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,391	Mar-22-2020, 06:10 AM Last Post: BrandonKastning
	Web crawler extracting specific text from HTML	lewdow	1	3,428	Jan-03-2020, 11:21 PM Last Post: snippsat
	Help on parsing simple text on HTML	amaumox	5	3,525	Jan-03-2020, 05:50 PM Last Post: amaumox
	Extract text between bold headlines from HTML	CostasG	1	2,351	Aug-31-2019, 10:53 AM Last Post: snippsat
	How do I get rid of the HTML tags in my output?	glittergirl	1	3,751	Aug-05-2019, 08:30 PM Last Post: snippsat

Beutifulsoup: how to pick text that's not in HTML tags?

User Panel Messages

Announcements