Python Forum
Beutifulsoup: how to pick text that's not in HTML tags?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beutifulsoup: how to pick text that's not in HTML tags?
#2
If I knew the url of your site, I would have used it for example,
for this, I use https://www.weather.gov/

load the web site into chrome or firefox.
highlight the text you are interested in and right click
choose inspect element, move cursor in inspect over text node:
<strong>Travel date:</strong>&nbsp;2019.10.10<br>
right click --> copy --> XPath
paste into code like (your xpath will be dfferent):
xpath = '/html/body/div[5]/div/div[4]/p/a[2]'
Now run code like:
from lxml import html
import requests
import sys


def get_stuff():
    page = None
    response = requests.get('https://www.weather.gov/')
    if response.status_code == 200:
        page = response.content
    else:
        print("c'ant load page")
        sys.exit(-1)
    
    tree = tree = html.fromstring((page))
    # replace with your xpath
    node = tree.xpath('/html/body/div[4]/div[2]/div[1]/div[2]/div/div[2]/p')
    text = node[0].text.strip()
    print(text)


if __name__ == '__main__':
    get_stuff()
results:
Output:
A slow moving storm system will bring a continued threat for heavy snow over the Rockies, heavy rain, flooding,and severe weather over the Plains into midweek. Over the Gulf of Mexico, Tropical Storm Michael is expected tostrengthen into a hurricane and cause direct impacts to the northeast Gulf Coast by midweek. Heavy rain from Michael could once again impact the Carolinas late week.
Reply


Messages In This Thread
RE: Beutifulsoup: how to pick text that's not in HTML tags? - by Larz60+ - Oct-08-2018, 11:07 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question Python Obstacles | Jeet-Kune-Do | BS4 (Tags > MariaDB) [URL/Local HTML] BrandonKastning 0 1,433 Feb-08-2022, 08:55 PM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,682 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,504 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Easy HTML Parser: Validating trs by attributes several tags deep? runswithascript 7 3,624 Aug-14-2020, 10:58 PM
Last Post: runswithascript
  Jinja2 HTML <a> tags not rendering properly ChaitanyaPy 4 3,280 Jun-28-2020, 06:12 PM
Last Post: ChaitanyaPy
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,391 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 3,428 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Help on parsing simple text on HTML amaumox 5 3,525 Jan-03-2020, 05:50 PM
Last Post: amaumox
  Extract text between bold headlines from HTML CostasG 1 2,351 Aug-31-2019, 10:53 AM
Last Post: snippsat
  How do I get rid of the HTML tags in my output? glittergirl 1 3,751 Aug-05-2019, 08:30 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020