Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help extracting text from element
#1
I've tried many different ways but can't seem to extract the product title and price from the following element:

<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A _2OMMP">Classic Fit Solid Wool Suit</h1>

<span id="current-price-string" class="_1ds4c">$338.00</span>

Thank you in advance for any suggestions.
Reply
#2
>>> from bs4 import BeautifulSoup
>>> html_string='''<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A _2OMMP">Classic Fit Solid Wool Suit</h1>
<span id="current-price-string" class="_1ds4c">$338.00</span>'''
>>> soup = BeautifulSoup(html_string, 'html.parser')
>>> row = soup.find('span')
>>> row
<span class="_1ds4c" id="current-price-string">$338.00</span>
>>> print(row.get_text())
$338.00
>>> row = soup.find('h1')
>>> print(row.get_text())
Classic Fit Solid Wool Suit
Reply
#3
Thank you! I'm still running into trouble, so I thought I'd post my complete script. Essentially what I'm trying to do is run a script where it checks the price of a suit and lets me know when it's dropped below $400.

import requests
from bs4 import BeautifulSoup
import time
import smtplib

URL = "https://shop.nordstrom.com/s/peter-millar-classic-fit-solid-wool-suit/4294847/full?origin=category-personalizedsort&breadcrumb=Home%2FMen%2FClothing%2FSuits%20%26%20Separates&fashioncolor=Black&fashionsize=15%3A46r~~42&color=charcoal"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0"}
Wanted_Price = 400

def trackprice():
price = int(float(getprice()))
if price > Wanted_Price:
diff = price - Wanted_Price
print(f"It's still ${diff} too expensive")
# else:
# print("Cheaper!")
if(price < Wanted_Price):
send_mail()

def getprice():
# page = requests.get(URL, headers=headers)
html_string='''<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A_2OMMP">Classic Fit Solid Wool Suit</h1>
<span id="current-price-string" class="_1ds4c">$338.00</span>'''
soup = BeautifulSoup(html_string, 'html.parser')
row = soup.find('span')
row
print(row.get_text())
row = soup.find('h1')
print(row.get_text())

def send_mail():
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()

server.login('[email protected]', 'password')

subject = 'Nordstrom price went wown'
body = 'Check link: https://shop.nordstrom.com/s/peter-milla...r=charcoal'

msg = f"Subject: {subject}\n\n{body}\n\n"

server.sendmail(
'[email protected]',
'[email protected]',
msg
)
print('Email has been sent')

server.quit()

if __name__ == "__main__":
while True:
trackprice()
time.sleep(60*60)
Reply
#4
Can you add python tags?
Reply
#5
Sorry, I'm not sure what you mean. I'm fairly new to this, so I apologize. I just posted my complete code so you could see it.
Reply
#6
Check BBC Code for more info on tags.

Are you able to get price when you run against URL? What is the trouble you are facing?
Reply
#7
So when I run my script here's the output. The other scripts I've created on other sites like Amazon would only come back with the desired info - in this case it would be Classic Fit Solid Wool Suit and $338.00.

python petermillar3.py
$338.00
Classic Fit Solid Wool Suit
Traceback (most recent call last):
File "petermillar3.py", line 55, in <module>
trackprice()
File "petermillar3.py", line 11, in trackprice
price = int(float(getprice()))
TypeError: float() argument must be a string or a number, not 'NoneType'
Reply
#8
Return is missing in getprice()

def getprice():
  ...
  row = soup.find('span')
  return(row.get_text().strip('$'))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extracting Text in a canvas using chain actions law 3 2,303 Apr-22-2020, 11:45 AM
Last Post: Larz60+
  Web crawler extracting specific text from HTML lewdow 1 3,396 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Selenium returning web element instead of desired text newbie_programmer 1 5,187 Dec-11-2019, 06:37 AM
Last Post: Malt

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020