Python Forum
Need logic on how to scrap 100K URLs
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need logic on how to scrap 100K URLs
#1
Hi,
I request you to explain to me the logic of how to proceed with the requirement.

My requirement is-

I have a website say www.example.com and once I log in, I have to search for a product. Then the website returns me things like -
  1. The demand for the product
  2. The supply for the product
  3. Medium sales for the product
  4. Maximum sales for the product

In another line, it gives me the' total number of this product'.

In another line, it gives me the most important information which is -
'Other related products'

These related products are like 'product name 123', 'product name 236, 'product name 483', etc. Once you click on all these related product, it will have a similar page with the same type of information like -
  1. The demand for the product
  2. The supply for the product
  3. Medium sales for the product
  4. Maximum sales for the product
In another line, it gives me the' total number of this product'.

In another line, it gives me the most important information which is -
'Other related products' etc and then some process has to be followed with each product.

What can be a python script logic which reads one URL and get all the information of that URL like -
demand, supply, medium sales, maximum sales, the total number of the product.

Then, it should click on all the related products one by one and extract all this information. so, it will open a chain of URLs as each product will have some related products and each related product has its own related project.

In this way, one URL will simultaneously open 100K URLs in the browser. So, to summarize, how I can proceed with the logic to extract information from around 100K URLs. The information which I want is -

  1. demand
  2. supply
  3. medium sales
  4. maximum sales
  5. total number of products on sales
Reply
#2
you can start here, doesn't take long, and you'll learn a lot about the basics

web scraping part 1
web scraping part 2
Reply
#3
(Jun-29-2020, 08:28 AM)Larz60+ Wrote: you can start here, doesn't take long, and you'll learn a lot about the basics

web scraping part 1
web scraping part 2

Thanks for the Reply. I am going through the posts which you shared but my requirement is completely different.

The posts do not cover any similar logic. I guess I have to learn advanced stuff and then proceed.

What do you suggest?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scrap --Need help Lizardpython 4 2,115 Oct-01-2023, 11:37 AM
Last Post: Lizardpython
  BeautifulSoup not parsing other URLs giddyhead 0 1,743 Feb-23-2022, 05:35 PM
Last Post: giddyhead
  I tried every way to scrap morningstar financials data without success so far sparkt 2 9,742 Oct-20-2020, 05:43 PM
Last Post: sparkt
  Web scrap multiple pages anilacem_302 3 4,662 Jul-01-2020, 07:50 PM
Last Post: mlieqo
  Scrap a dynamic span hefaz 0 3,262 Mar-07-2020, 02:56 PM
Last Post: hefaz
  scrap by defining 3 functions zarize 0 2,258 Feb-18-2020, 03:55 PM
Last Post: zarize
  Skipping anti-scrap zarize 0 2,327 Jan-17-2020, 11:51 AM
Last Post: zarize
  Cannot get selenium to scrap past the first two pages newbie_programmer 0 4,601 Dec-12-2019, 06:19 AM
Last Post: newbie_programmer
  Scrap data from not standarized page? zarize 4 4,292 Nov-25-2019, 10:25 AM
Last Post: zarize
  page impossible to scrap? :O zarize 2 4,779 Oct-03-2019, 02:44 PM
Last Post: zarize

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020