Python Forum
Need logic on how to scrap 100K URLs
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need logic on how to scrap 100K URLs
#1
Hi,
I request you to explain to me the logic of how to proceed with the requirement.

My requirement is-

I have a website say www.example.com and once I log in, I have to search for a product. Then the website returns me things like -
  1. The demand for the product
  2. The supply for the product
  3. Medium sales for the product
  4. Maximum sales for the product

In another line, it gives me the' total number of this product'.

In another line, it gives me the most important information which is -
'Other related products'

These related products are like 'product name 123', 'product name 236, 'product name 483', etc. Once you click on all these related product, it will have a similar page with the same type of information like -
  1. The demand for the product
  2. The supply for the product
  3. Medium sales for the product
  4. Maximum sales for the product
In another line, it gives me the' total number of this product'.

In another line, it gives me the most important information which is -
'Other related products' etc and then some process has to be followed with each product.

What can be a python script logic which reads one URL and get all the information of that URL like -
demand, supply, medium sales, maximum sales, the total number of the product.

Then, it should click on all the related products one by one and extract all this information. so, it will open a chain of URLs as each product will have some related products and each related product has its own related project.

In this way, one URL will simultaneously open 100K URLs in the browser. So, to summarize, how I can proceed with the logic to extract information from around 100K URLs. The information which I want is -

  1. demand
  2. supply
  3. medium sales
  4. maximum sales
  5. total number of products on sales
Reply
#2
you can start here, doesn't take long, and you'll learn a lot about the basics

web scraping part 1
web scraping part 2
Reply
#3
(Jun-29-2020, 08:28 AM)Larz60+ Wrote: you can start here, doesn't take long, and you'll learn a lot about the basics

web scraping part 1
web scraping part 2

Thanks for the Reply. I am going through the posts which you shared but my requirement is completely different.

The posts do not cover any similar logic. I guess I have to learn advanced stuff and then proceed.

What do you suggest?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scrap --Need help Lizardpython 4 954 Oct-01-2023, 11:37 AM
Last Post: Lizardpython
  BeautifulSoup not parsing other URLs giddyhead 0 1,169 Feb-23-2022, 05:35 PM
Last Post: giddyhead
  I tried every way to scrap morningstar financials data without success so far sparkt 2 8,172 Oct-20-2020, 05:43 PM
Last Post: sparkt
  Web scrap multiple pages anilacem_302 3 3,783 Jul-01-2020, 07:50 PM
Last Post: mlieqo
  Scrap a dynamic span hefaz 0 2,659 Mar-07-2020, 02:56 PM
Last Post: hefaz
  scrap by defining 3 functions zarize 0 1,834 Feb-18-2020, 03:55 PM
Last Post: zarize
  Skipping anti-scrap zarize 0 1,854 Jan-17-2020, 11:51 AM
Last Post: zarize
  Cannot get selenium to scrap past the first two pages newbie_programmer 0 4,134 Dec-12-2019, 06:19 AM
Last Post: newbie_programmer
  Scrap data from not standarized page? zarize 4 3,244 Nov-25-2019, 10:25 AM
Last Post: zarize
  Scrape multiple urls LXML santdoyle 1 3,516 Oct-26-2019, 09:53 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020