Web scrap multiple pages

anilacem_302 · (This post was last modified: Jun-30-2020, 06:52 PM by Larz60+.)

Hi All,

Hope you are doing good.

Am very new to python and am in a process of learning it.

Can someone please help me with the code for web scrapping multiple pages in the below website

https://abom.learningbuilder.com/public/...astName&_d=

I have the below code which fetch only one page. How can i get data from all the pages on website? Please let me know.

import urllib
import urllib.request
from bs4 import BeautifulSoup 
def make_soup(url):
    thepage=urllib.request.urlopen(url)
    soupdata=BeautifulSoup(thepage,"html.parser")
    return soupdata
mydata_saved=""
soup=make_soup("https://abom.learningbuilder.com/public/membersearch?model.FirstName=&model.LastName=&model.UniqueId=&model.City=&model.State=&performSearch=true&_p=1&_s=20&_o=LastName&_d=")
for record in soup.findAll('tr'):
    mydata=""
    for data in soup.findAll('td'):
        mydata=mydata+","+data.text
        mydata_saved=mydata_saved+"\n"+mydata[1:]
        print(mydata_saved)

Thank you.

Br,
Anil

mlieqo · Jun-30-2020, 08:14 PM

First you need to find out how many pages there is and then you can jump through individual pages using the _p url query parameter

number_of_pages = 200  # just and example number
url = "https://abom.learningbuilder.com/public/membersearch?model.FirstName=&model.LastName=&model.UniqueId=&model.City=&model.State=&performSearch=true&_p={page_number}&_s=20&_o=LastName&_d="
for page_number in range(number_of_pages):
    soup = make_soup(url.format(page_number=page_number))
    for record in soup.findAll('tr'):
        mydata = ""
        for data in soup.findAll('td'):
            mydata = mydata + "," + data.text
            mydata_saved = mydata_saved + "\n" + mydata[1:]
            print(mydata_saved)

anilacem_302 · (This post was last modified: Jul-01-2020, 12:25 PM by anilacem_302.)

Awesome! That works. But i ran into issue again. When i see my scrapped data for all 199 pages, i get only 600 records out of 4000 records in the website. Can you please advise the issue why my output does not include all the records. Am using online Jupyter notebook to run this. I also installed Anaconda and used Jupyter there, still i face the same issue. Can you please help me on this?

Thank you.

Br,
Anil

Am having this error when i ran the code. Can you please advise how do i increase data rate
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.

mlieqo · Jul-01-2020, 07:50 PM

Honestly don't know, as I personally do not use jupyter but maybe you can try this answer ->https://stackoverflow.com/a/49305034/11274530

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Web scrap --Need help	Lizardpython	4	1,031	Oct-01-2023, 11:37 AM Last Post: Lizardpython
	Scrape table from multiple pages	Nhattanktnn	1	867	Jun-07-2023, 09:35 AM Last Post: Larz60+
	I tried every way to scrap morningstar financials data without success so far	sparkt	2	8,269	Oct-20-2020, 05:43 PM Last Post: sparkt
	Need logic on how to scrap 100K URLs	goodmind	2	2,637	Jun-29-2020, 09:53 AM Last Post: goodmind
	scraping multiple pages from table	bandar	1	2,707	Jun-27-2020, 10:43 PM Last Post: Larz60+
	Beginner help - Leap Year Issue Feb 29 and multiple pages	warriordazza	3	2,726	May-10-2020, 01:14 AM Last Post: warriordazza
	Scraping Multiple Pages	mbadatanut	1	4,235	May-08-2020, 02:30 AM Last Post: Larz60+
	Scrap a dynamic span	hefaz	0	2,701	Mar-07-2020, 02:56 PM Last Post: hefaz
	scrap by defining 3 functions	zarize	0	1,863	Feb-18-2020, 03:55 PM Last Post: zarize
	Looping through multiple pages with changing url	Qaruri	2	2,601	Jan-17-2020, 01:55 PM Last Post: Qaruri

Web scrap multiple pages

User Panel Messages

Announcements