Web scrap multiple pages - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Web scrap multiple pages (/thread-27998.html) |
Web scrap multiple pages - anilacem_302 - Jun-30-2020 Hi All, Hope you are doing good. Am very new to python and am in a process of learning it. Can someone please help me with the code for web scrapping multiple pages in the below website https://abom.learningbuilder.com/public/membersearch?model.FirstName=&model.LastName=&model.UniqueId=&model.City=&model.State=&performSearch=true&_p=1&_s=20&_o=LastName&_d= I have the below code which fetch only one page. How can i get data from all the pages on website? Please let me know. import urllib import urllib.request from bs4 import BeautifulSoup def make_soup(url): thepage=urllib.request.urlopen(url) soupdata=BeautifulSoup(thepage,"html.parser") return soupdata mydata_saved="" soup=make_soup("https://abom.learningbuilder.com/public/membersearch?model.FirstName=&model.LastName=&model.UniqueId=&model.City=&model.State=&performSearch=true&_p=1&_s=20&_o=LastName&_d=") for record in soup.findAll('tr'): mydata="" for data in soup.findAll('td'): mydata=mydata+","+data.text mydata_saved=mydata_saved+"\n"+mydata[1:] print(mydata_saved)Thank you. Br, Anil RE: Web scrap multiple pages - mlieqo - Jun-30-2020 First you need to find out how many pages there is and then you can jump through individual pages using the _p url query parameter number_of_pages = 200 # just and example number url = "https://abom.learningbuilder.com/public/membersearch?model.FirstName=&model.LastName=&model.UniqueId=&model.City=&model.State=&performSearch=true&_p={page_number}&_s=20&_o=LastName&_d=" for page_number in range(number_of_pages): soup = make_soup(url.format(page_number=page_number)) for record in soup.findAll('tr'): mydata = "" for data in soup.findAll('td'): mydata = mydata + "," + data.text mydata_saved = mydata_saved + "\n" + mydata[1:] print(mydata_saved) RE: Web scrap multiple pages - anilacem_302 - Jul-01-2020 Awesome! That works. But i ran into issue again. When i see my scrapped data for all 199 pages, i get only 600 records out of 4000 records in the website. Can you please advise the issue why my output does not include all the records. Am using online Jupyter notebook to run this. I also installed Anaconda and used Jupyter there, still i face the same issue. Can you please help me on this? Thank you. Br, Anil Am having this error when i ran the code. Can you please advise how do i increase data rate IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable --NotebookApp.iopub_data_rate_limit .
RE: Web scrap multiple pages - mlieqo - Jul-01-2020 Honestly don't know, as I personally do not use jupyter but maybe you can try this answer ->https://stackoverflow.com/a/49305034/11274530 |