Python Forum

Hello,
Probably a solution already exists but I have trouble formulating my request correctly.
My task can be summarized in two points:

Open Amazon website and execute "book search" using title or ISBN
On the page that will open extract metadata, e.g. "customer raiting"

Any ideas ?
Thanks.

(Nov-06-2019, 08:50 AM)Pavel_47 Wrote: [ -> ]Any ideas ?

Yes,can give some hints.
1 can be easy,2 can be a lot harder.
2 you should know basic web scraping and may need to use Selenium as Amazon can use JavaScript for a lot of stuff.
A couple of tutorials part-1 and part-2.

Here some hint that pretty much solve task 1.

search = 'Fluent Python'
search = '+'.join(search.split()) # Add + as Amazon query use that for white spaces
url = f'https://www.amazon.com/s?k={search}&ref=nb_sb_noss'
print(url)

Output:
https://www.amazon.com/s?k=Fluent+Python&ref=nb_sb_noss

A single word search(eg car) would just ignore line 2(with +).

Output:
https://www.amazon.com/s?k=car&ref=nb_sb_noss

Thanks,
The snippet you provided just generate request that once executed in broser opens web page that include several books with word "Python" in title.
When using just ISBN number in request (e.g. https://www.amazon.com/s?k=1491946008&ref=nb_sb_noss), the page dispayed only one book.
But what is interesting is "emitate" in python code mouse click on the title in order to be redirected to the page, containing metadata of the book.

It's just the same way with ISBN number as i have showed.
So search for ISBN number or a normal search shall url be given to Requests and preferably in combo with Beautiful Soup .
Then can follow link to main page that has meta data.
If i take a quick look so is eg head > meta title this:

<meta name="title" content="Fluent Python: Clear, Concise, and Effective Programming: Luciano Ramalho: 4708364244547: Amazon.com: Books">

Try put some code together as it's your task and try to get some meta data from a search.
As mention this may fail it depend on how Amazon generate this(eg JavaScript) than have to trow in Selenium to solve it.

Quote:So search for ISBN number or a normal search shall url be given to Requests and preferably in combo with Beautiful Soup .

Well, the request that uses ISBN number In bold)
https://www.amazon.com/s?k=1491946008&ref=nb_sb_noss
opens the page, where the book "Fluent Python: Clear, Concise, and Effective Programming" is pressent, but it is not the page, entirely dedicated to this book.
There is other staff on the page (you can check).
In order to access to the book dedicated page, one must click on book title.
... and the page dedicated to the book opens. Here it is:

https://www.amazon.com/Fluent-Python-Con...979&sr=8-1

So the Python code should explore the original page in order to find the above link, isn't it.
Once the book dedicated page open, we can access to meta data.
So, the most difficult task is to find a link, that gives access to the book, isn't it ?

(Nov-06-2019, 05:00 PM)Pavel_47 Wrote: [ -> ]Well, the request that uses ISBN number In bold)

Well i know this,it's just the way same with f-string as i showed just have to put in the number.

ISBN_number = "1491946008"
url = f"https://www.amazon.com/s?k={ISBN_number}&ref=nb_sb_noss"
# Out
>>> url
'https://www.amazon.com/s?k=1491946008&ref=nb_sb_noss'

Quote:So, the most difficult task is to find a link, that gives access to the book, isn't it ?

No it depend on how Amazon generated link just have to try,to give a little more help.
Need User Agent or think is a bot.

import requests
from bs4 import BeautifulSoup

ISBN_number = "1491946008"
url = f"https://www.amazon.com/s?k={ISBN_number}&ref=nb_sb_noss"

headers = {
    "User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)\
     AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "lxml")
link = soup.find("h2")

To look at output.

>>> link
<h2 class="a-size-mini a-spacing-none a-color-base s-line-clamp-2">
<a class="a-link-normal a-text-normal" href="/Fluent-Python-Concise-Effective-Programming/dp/1491946008/ref=sr_1_1?keywords=1491946008&amp;qid=1573061459&amp;sr=8-1">
<span class="a-size-medium a-color-base a-text-normal" dir="auto">Fluent Python: Clear, Concise, and Effective Programming</span>
</a>
</h2>
>>> 
>>> link.a.get('href')
'/Fluent-Python-Concise-Effective-Programming/dp/1491946008/ref=sr_1_1?keywords=1491946008&qid=1573061459&sr=8-1'

So don't get whole link,but this is not a problem at all just have to put in https://www.amazon.com first(use f-string).
Try to build on this,i guess you new to all this and Amazon is not the easiest site to start learn this Wink

Well, it worked !
This request provides access to title.
What about author name, publisher and publication year.
Also customer review data: average and number of reviewers.

(Nov-06-2019, 06:09 PM)Pavel_47 Wrote: [ -> ]What about author name, publisher and publication year.
Also customer review data: average and number of reviewers.

Give it try and see how it's goes,you have gotten good hints to start with.
It's your task,no point of me doing the whole task as this is not the Jobs section of forum Wink

Pavel_47

snippsat

Pavel_47

snippsat

Pavel_47

snippsat

Pavel_47

snippsat