Python Forum
webscrapping links and then enter those links to scrape data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
webscrapping links and then enter those links to scrape data
#1
Hi, I am having difficulty trying to scrap webscrapping links and then enter those links to scrape data.

The webscrapping links is done but to enter those links from the first time webscrapping and entering those links inside and then collect data is another difficulty.

I have attached my first tier codes but i cant figure out the second tier codes.

Appreciate any kind help.

Thanks.

from bs4 import BeautifulSoup
import requests
import pandas as pd
from urllib.request import urlopen,urlparse, Request,HTTPError
import urllib
import re
import numpy as np
import csv
from http.client import BadStatusLine
import ssl
import json
#from googlesearch import search

class Google:
    @classmethod
    def search1(self, search):
      url_list = []   #store all the extracted urls in a List
      title_list = [] #store all the extracted titles in a List
      description_list = []  #store all the extracted Description in a List
      all_links = []
 
      for start in range(0,10):
        #page = requests.get('https://www.google.com/search?rlz=1C1CHBF_enSG851SG851&ei=Nib2XI6FEcmLvQS1xb-wBQ&q=site%3Alinkedin.com+inurl%3Ain+%7C+inurl%3Apub+%7C+inurl%3Aprofile+-inurl%3Adir+-inurl%3Atitle+-inurl%3Agroups+-inurl%3Acompany+-inurl%3Ajobs+-inurl%3Ajobs2+VP&oq=site%3Alinkedin.com+inurl%3Ain+%7C+inurl%3Apub+%7C+inurl%3Aprofile+-inurl%3Adir+-inurl%3Atitle+-inurl%3Agroups+-inurl%3Acompany+-inurl%3Ajobs+-inurl%3Ajobs2'+search+str(start*10), verify = False)
        page = requests.get('http://www.google.com/search?q='+search+str(start*10), verify = False, timeout=5)  
     
        #page = requests.get('https://www.google.com/search?q='+search, verify = True)
        soup = BeautifulSoup(page.content, "lxml")
        #soup = BeautifulSoup(page.content)

        
        for link in soup.find_all("a",href=re.compile("(?<=/url\?q=)(htt.*://.*)")): #original working code
            a = (re.split(":(?=http)",link["href"].replace("/url?q=","")))    
            a = a[0].split("&")[0]       
            url_list.append(a)   
Reply
#2
You would loop your url_list and create a request for each one in there.
Reply
#3
Hi metulburr, thanks for your quick reply. I'll work on it, thanks alot.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  getting links from webpage and store it into an array apollo 1 323 May-22-2021, 03:35 PM
Last Post: perfringo
  Is it possible to scrape this data from Google Searches rosjo 1 1,102 Nov-06-2020, 06:51 PM
Last Post: Larz60+
  Resolving YouTube search links pythonnewbie138 16 2,137 Aug-04-2020, 08:10 PM
Last Post: pythonnewbie138
  Webscrapping of Images that requires Authentication junos4350 1 805 Jun-08-2020, 08:32 AM
Last Post: alekson
  Dynamic links with selenium EvilDodo 1 727 Apr-04-2020, 03:18 PM
Last Post: ndc85430
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 1,762 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  Extracting all the links on a website randeniyamohan 1 2,055 Jan-09-2020, 04:47 PM
Last Post: Clunk_Head
  Saving links as text hessu 1 699 Jan-05-2020, 09:29 AM
Last Post: Larz60+
  Want to scrape a table data and export it into CSV format tahir1990 9 1,990 Oct-22-2019, 08:03 AM
Last Post: buran
  Extracting links from website with selenium bs4 and python M1ck0 1 1,717 Jul-20-2019, 10:29 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020