Python Forum
webscrapping links and then enter those links to scrape data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
webscrapping links and then enter those links to scrape data
#1
Hi, I am having difficulty trying to scrap webscrapping links and then enter those links to scrape data.

The webscrapping links is done but to enter those links from the first time webscrapping and entering those links inside and then collect data is another difficulty.

I have attached my first tier codes but i cant figure out the second tier codes.

Appreciate any kind help.

Thanks.

from bs4 import BeautifulSoup
import requests
import pandas as pd
from urllib.request import urlopen,urlparse, Request,HTTPError
import urllib
import re
import numpy as np
import csv
from http.client import BadStatusLine
import ssl
import json
#from googlesearch import search

class Google:
    @classmethod
    def search1(self, search):
      url_list = []   #store all the extracted urls in a List
      title_list = [] #store all the extracted titles in a List
      description_list = []  #store all the extracted Description in a List
      all_links = []
 
      for start in range(0,10):
        #page = requests.get('https://www.google.com/search?rlz=1C1CHBF_enSG851SG851&ei=Nib2XI6FEcmLvQS1xb-wBQ&q=site%3Alinkedin.com+inurl%3Ain+%7C+inurl%3Apub+%7C+inurl%3Aprofile+-inurl%3Adir+-inurl%3Atitle+-inurl%3Agroups+-inurl%3Acompany+-inurl%3Ajobs+-inurl%3Ajobs2+VP&oq=site%3Alinkedin.com+inurl%3Ain+%7C+inurl%3Apub+%7C+inurl%3Aprofile+-inurl%3Adir+-inurl%3Atitle+-inurl%3Agroups+-inurl%3Acompany+-inurl%3Ajobs+-inurl%3Ajobs2'+search+str(start*10), verify = False)
        page = requests.get('http://www.google.com/search?q='+search+str(start*10), verify = False, timeout=5)  
     
        #page = requests.get('https://www.google.com/search?q='+search, verify = True)
        soup = BeautifulSoup(page.content, "lxml")
        #soup = BeautifulSoup(page.content)

        
        for link in soup.find_all("a",href=re.compile("(?<=/url\?q=)(htt.*://.*)")): #original working code
            a = (re.split(":(?=http)",link["href"].replace("/url?q=","")))    
            a = a[0].split("&")[0]       
            url_list.append(a)   
Reply
#2
You would loop your url_list and create a request for each one in there.
Recommended Tutorials:
Reply
#3
Hi metulburr, thanks for your quick reply. I'll work on it, thanks alot.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 795 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
  Webscrapping sport betting websites KoinKoin 3 5,338 Nov-08-2023, 03:00 PM
Last Post: LoriBrown
  Help Scraping links and table from link cartonics 11 1,461 Oct-12-2023, 06:42 AM
Last Post: cartonics
  webscrapping links from pandas dataframe Wolverin 2 2,236 Aug-28-2023, 12:07 PM
Last Post: Gaurav_Kumar
  How to extract links from grid located on webpage Pavel_47 5 1,373 Aug-04-2023, 12:43 PM
Last Post: Gaurav_Kumar
  I am trying to scrape data to broadcast it on Telegram BarryBoos 1 1,993 Jun-10-2023, 02:36 PM
Last Post: snippsat
  All product links to products on a website MarionStorm 0 1,057 Jun-02-2022, 11:17 PM
Last Post: MarionStorm
  Send a requests to a magnet links shortener which doesn't have APIs Ascalon 3 2,127 Feb-20-2022, 04:50 PM
Last Post: snippsat
  extract javascript links Larz60+ 0 1,748 Feb-16-2022, 10:49 AM
Last Post: Larz60+
  How can I target and scrape a data-stat never5000 5 2,744 Feb-11-2022, 07:59 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020