Python Forum
Help extracting text from element - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Help extracting text from element (/thread-26366.html)



Help extracting text from element - jpdallas - Apr-29-2020

I've tried many different ways but can't seem to extract the product title and price from the following element:

<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A _2OMMP">Classic Fit Solid Wool Suit</h1>

<span id="current-price-string" class="_1ds4c">$338.00</span>

Thank you in advance for any suggestions.


RE: Help extracting text from element - anbu23 - Apr-29-2020

>>> from bs4 import BeautifulSoup
>>> html_string='''<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A _2OMMP">Classic Fit Solid Wool Suit</h1>
<span id="current-price-string" class="_1ds4c">$338.00</span>'''
>>> soup = BeautifulSoup(html_string, 'html.parser')
>>> row = soup.find('span')
>>> row
<span class="_1ds4c" id="current-price-string">$338.00</span>
>>> print(row.get_text())
$338.00
>>> row = soup.find('h1')
>>> print(row.get_text())
Classic Fit Solid Wool Suit



RE: Help extracting text from element - jpdallas - Apr-29-2020

Thank you! I'm still running into trouble, so I thought I'd post my complete script. Essentially what I'm trying to do is run a script where it checks the price of a suit and lets me know when it's dropped below $400.

import requests
from bs4 import BeautifulSoup
import time
import smtplib

URL = "https://shop.nordstrom.com/s/peter-millar-classic-fit-solid-wool-suit/4294847/full?origin=category-personalizedsort&breadcrumb=Home%2FMen%2FClothing%2FSuits%20%26%20Separates&fashioncolor=Black&fashionsize=15%3A46r~~42&color=charcoal"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0"}
Wanted_Price = 400

def trackprice():
price = int(float(getprice()))
if price > Wanted_Price:
diff = price - Wanted_Price
print(f"It's still ${diff} too expensive")
# else:
# print("Cheaper!")
if(price < Wanted_Price):
send_mail()

def getprice():
# page = requests.get(URL, headers=headers)
html_string='''<h1 itemprop="name" overrideelementwith="div" class=" _6YOLH _1JtW7 _2VF_A_2OMMP">Classic Fit Solid Wool Suit</h1>
<span id="current-price-string" class="_1ds4c">$338.00</span>'''
soup = BeautifulSoup(html_string, 'html.parser')
row = soup.find('span')
row
print(row.get_text())
row = soup.find('h1')
print(row.get_text())

def send_mail():
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()

server.login('[email protected]', 'password')

subject = 'Nordstrom price went wown'
body = 'Check link: https://shop.nordstrom.com/s/peter-millar-classic-fit-solid-wool-suit/4294847/full?origin=category-personalizedsort&breadcrumb=Home%2FMen%2FClothing%2FSuits%20%26%20Separates&fashioncolor=Black&fashionsize=15%3A46r~~42&color=charcoal'

msg = f"Subject: {subject}\n\n{body}\n\n"

server.sendmail(
'[email protected]',
'[email protected]',
msg
)
print('Email has been sent')

server.quit()

if __name__ == "__main__":
while True:
trackprice()
time.sleep(60*60)


RE: Help extracting text from element - anbu23 - Apr-29-2020

Can you add python tags?


RE: Help extracting text from element - jpdallas - Apr-29-2020

Sorry, I'm not sure what you mean. I'm fairly new to this, so I apologize. I just posted my complete code so you could see it.


RE: Help extracting text from element - anbu23 - Apr-29-2020

Check BBC Code for more info on tags.

Are you able to get price when you run against URL? What is the trouble you are facing?


RE: Help extracting text from element - jpdallas - Apr-29-2020

So when I run my script here's the output. The other scripts I've created on other sites like Amazon would only come back with the desired info - in this case it would be Classic Fit Solid Wool Suit and $338.00.

python petermillar3.py
$338.00
Classic Fit Solid Wool Suit
Traceback (most recent call last):
File "petermillar3.py", line 55, in <module>
trackprice()
File "petermillar3.py", line 11, in trackprice
price = int(float(getprice()))
TypeError: float() argument must be a string or a number, not 'NoneType'


RE: Help extracting text from element - anbu23 - Apr-30-2020

Return is missing in getprice()

def getprice():
  ...
  row = soup.find('span')
  return(row.get_text().strip('$'))