Python Forum

Full Version: Help extracting text from element
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I've tried many different ways but can't seem to extract the product title and price from the following element:

<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A _2OMMP">Classic Fit Solid Wool Suit</h1>

<span id="current-price-string" class="_1ds4c">$338.00</span>

Thank you in advance for any suggestions.
>>> from bs4 import BeautifulSoup
>>> html_string='''<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A _2OMMP">Classic Fit Solid Wool Suit</h1>
<span id="current-price-string" class="_1ds4c">$338.00</span>'''
>>> soup = BeautifulSoup(html_string, 'html.parser')
>>> row = soup.find('span')
>>> row
<span class="_1ds4c" id="current-price-string">$338.00</span>
>>> print(row.get_text())
$338.00
>>> row = soup.find('h1')
>>> print(row.get_text())
Classic Fit Solid Wool Suit
Thank you! I'm still running into trouble, so I thought I'd post my complete script. Essentially what I'm trying to do is run a script where it checks the price of a suit and lets me know when it's dropped below $400.

import requests
from bs4 import BeautifulSoup
import time
import smtplib

URL = "https://shop.nordstrom.com/s/peter-millar-classic-fit-solid-wool-suit/4294847/full?origin=category-personalizedsort&breadcrumb=Home%2FMen%2FClothing%2FSuits%20%26%20Separates&fashioncolor=Black&fashionsize=15%3A46r~~42&color=charcoal"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0"}
Wanted_Price = 400

def trackprice():
price = int(float(getprice()))
if price > Wanted_Price:
diff = price - Wanted_Price
print(f"It's still ${diff} too expensive")
# else:
# print("Cheaper!")
if(price < Wanted_Price):
send_mail()

def getprice():
# page = requests.get(URL, headers=headers)
html_string='''<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A_2OMMP">Classic Fit Solid Wool Suit</h1>
<span id="current-price-string" class="_1ds4c">$338.00</span>'''
soup = BeautifulSoup(html_string, 'html.parser')
row = soup.find('span')
row
print(row.get_text())
row = soup.find('h1')
print(row.get_text())

def send_mail():
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()

server.login('[email protected]', 'password')

subject = 'Nordstrom price went wown'
body = 'Check link: https://shop.nordstrom.com/s/peter-milla...r=charcoal'

msg = f"Subject: {subject}\n\n{body}\n\n"

server.sendmail(
'[email protected]',
'[email protected]',
msg
)
print('Email has been sent')

server.quit()

if __name__ == "__main__":
while True:
trackprice()
time.sleep(60*60)
Can you add python tags?
Sorry, I'm not sure what you mean. I'm fairly new to this, so I apologize. I just posted my complete code so you could see it.
Check BBC Code for more info on tags.

Are you able to get price when you run against URL? What is the trouble you are facing?
So when I run my script here's the output. The other scripts I've created on other sites like Amazon would only come back with the desired info - in this case it would be Classic Fit Solid Wool Suit and $338.00.

python petermillar3.py
$338.00
Classic Fit Solid Wool Suit
Traceback (most recent call last):
File "petermillar3.py", line 55, in <module>
trackprice()
File "petermillar3.py", line 11, in trackprice
price = int(float(getprice()))
TypeError: float() argument must be a string or a number, not 'NoneType'
Return is missing in getprice()

def getprice():
  ...
  row = soup.find('span')
  return(row.get_text().strip('$'))