Python Forum
With Selenium create a google Search list in Incognito mode withe specific location,
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
With Selenium create a google Search list in Incognito mode withe specific location,
#1
Hello to all

a bit of Context:

I have to do a deadly repetitive task for a colleague that begging Monday.
Make a list of Search results in Google Search in Incognito/Secret mode. 300 results for 17 locations, for each Link, title, short description. ARGGHHHHH
I want also to help my colleague not to explode.

What I did until now:

I tried to use Selenium a first time, wired errors (I will come back to that after) occurred. I switched to fake_useragent and BeautifulSoup.
The code is working but I don't know if it is possible to implement location and Incognito mode.
Here is the code:
import urllib
import csv

import requests
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

import re


csv_list = [["順位", "タイトル", "要約", "リンク", "関連キーワード"]]

query = "'tour eifelle'"
query = urllib.parse.quote_plus(query) # Format into URL encoding
number_result = 20



ua = UserAgent()

google_url = "https://www.google.com/search?q=" + query + "&num=" + str(number_result)
response = requests.get(google_url, {"User-Agent": ua.random})
soup = BeautifulSoup(response.text, "html.parser")

result_div = soup.find_all('div', attrs = {'class': 'ZINbbc'})


links = []
titles = []
descriptions = []
link2= ""
for r in result_div:
    # Checks if each element is present, else, raise exception
    try:
        link = r.find('a', href = True)
        title = r.find('div', attrs={'class':'vvjwJb'}).get_text()
        description = r.find('div', attrs={'class':'s3v9rd'}).get_text()
        
        # Check to make sure everything is present before appending
        if link != '' and title != '' and description != '': 
            link3= link['href'].lstrip('/url?q=')
            link2=re.sub(r'&sa.*',"",link3)
            links.append(link2)
            titles.append(title)
            descriptions.append(description)
    # Next loop if one element is not present
    except:
        continue


#to_remove = []
#clean_links = []
#for i, l in enumerate(links):
#    clean = re.search('\/url\?q\=(.*)\&sa',l)

    # Anything that doesn't fit the above pattern will be removed
#    if clean is None:
#        to_remove.append(i)
#        continue
#    clean_links.append(clean.group(1))

# Remove the corresponding titles & descriptions
#for x in to_remove:
#    del titles[x]
#    del descriptions[x]





for i in range(len(titles)):
    add_list=[i+1,titles[i],descriptions[i],links[i]]
    csv_list.append(add_list)

# タイトルリストをcsvに保存

with open('Search_word.csv','w',encoding="utf-8_sig") as f:
    writecsv = csv.writer(f, lineterminator='\n')
    writecsv.writerows(csv_list)


#links 
#titles 
#descriptions

    
Then After that
I tried to go back to Selenium


Here is the code:

import csv
import time  # スリープを使うために必要
from selenium import webdriver  # Webブラウザを自動操作する(python -m pip install selenium)
import chromedriver_binary  # パスを通すためのコード

def ranking(driver):
    i = 1  # ループ番号、ページ番号を定義

    title_list = []  # タイトルを格納する空リストを用意
    link_list = []  # URLを格納する空リストを用意
    summary_list = []
    RelatedKeywords = []

    # 現在のページが指定した最大分析ページを超えるまでループする
    while i <= i_max:
        # タイトルとリンクはclass="r"に入っている
        class_group = driver.find_elements_by_class_name('r')
        class_group1 = driver.find_elements_by_class_name('s')
        class_group2 = driver.find_elements_by_class_name('nVcaUb')
        # タイトルとリンクを抽出しリストに追加するforループ
        for elem in class_group:
            title_list.append(elem.find_element_by_class_name('LC20lb').text)  # タイトル(class="LC20lb")
            link_list.append(elem.find_element_by_tag_name('a').get_attribute('href'))  # リンク(aタグのhref属性)

        for elem in class_group1:
            summary_list.append(elem.find_element_by_class_name('st').text)  # リンク(aタグのhref属性)

        for elem in class_group2:
            RelatedKeywords.append(elem.text)  # リンク(aタグのhref属性)

        # 「次へ」は1つしかないが、あえてelementsで複数検索。空のリストであれば最終ページの意味になる。
        if driver.find_elements_by_id('pnnext') == []:
            i = i_max + 1
        else:
            # 次ページのURLはid="pnnext"のhref属性
            next_page = driver.find_element_by_id('pnnext').get_attribute('href')
            driver.get(next_page)  # 次ページへ遷移する
            i = i + 1  # iを更新
            time.sleep(3)  # 3秒間待機

    return title_list, link_list, summary_list, RelatedKeywords  # タイトルとリンクのリストを戻り値に指定


           
#



driver = webdriver.Chrome()  # Chromeを準備
# サンプルのHTMLを開く
driver.get('https://www.google.com/')  # Googleを開く
i_max = 5  # 最大何ページまで分析するかを定義
search = driver.find_element_by_name('q')  # HTML内で検索ボックス(name='q')を指定する
search.send_keys('Test blender')  # 検索ワードを送信する
search.submit()  # 検索を実行
time.sleep(1.5)  # 1.5秒間待機

# ranking関数を実行してタイトルとURLリストを取得する
title, link, summary, RelatedKeywords = ranking(driver)


csv_list = [["順位", "タイトル", "要約", "リンク", "関連キーワード"]]

for i in range(len(title)):
    add_list=[i+1,title[i],summary[i],link[i]]
    csv_list.append(add_list)

# タイトルリストをcsvに保存

with open('Search_word.csv','w',encoding="utf-8_sig") as f:
    writecsv = csv.writer(f, lineterminator='\n')
    writecsv.writerows(csv_list)

driver.quit()
I specified the path
Quote:C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary

But I get this Error Message.

Error:
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 22:45:29) [MSC v.1916 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license()" for more information. >>> = RESTART: C:\Users\name\Desktop\B\Python\Recherche de mots\Cherche de mots.py Traceback (most recent call last): File "C:\Users\name\Desktop\B\Python\Recherche de mots\Cherche de mots.py", line 49, in <module> driver = webdriver.Chrome() # Chromeを準備 File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 76, in __init__ RemoteWebDriver.__init__( File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__ self.start_session(capabilities, browser_profile) File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
After taht I tried to specify the path directly in the code

this ligne
driver = webdriver.Chrome()
But I encounter an other problem,
driver = webdriver.Chrome(r'C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary')
Error:
File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 72, in start self.process = subprocess.Popen(cmd, env=self.env, File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] 指定されたファイルが見つかりません。 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\Name\Desktop\Bilhaud\Python\Recherche de mots\Cherche de mots.py", line 49, in <module> driver = webdriver.Chrome(r'C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary') File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__ self.service.start() File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start raise WebDriverException( selenium.common.exceptions.WebDriverException: Message: 'chromedriver_binary' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
I tried all the solutions here,
None of the solution were working

If you can find something to help I will be extremely happy.
Reply


Messages In This Thread
With Selenium create a google Search list in Incognito mode withe specific location, - by tsurubaso - Jun-13-2020, 10:50 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Photo Disable checkbox of google maps markers/labels using selenium erickkill 0 1,219 Nov-25-2021, 12:20 PM
Last Post: erickkill
  Selenium innerHTML list, print specific value denis22934 2 3,180 Jun-14-2021, 04:59 AM
Last Post: denis22934
  How to get specific TD text via Selenium? euras 3 8,655 May-14-2021, 05:12 PM
Last Post: snippsat
  help ! selenium and google sheet puttimet38 2 2,543 Mar-12-2021, 09:50 AM
Last Post: puttimet38
  How to use Selenium on EdgeHTML, when having WebDrivers in other location? euras 2 1,921 Feb-03-2021, 06:02 PM
Last Post: euras
  Selenium google login probottpric 0 2,151 Oct-09-2020, 04:19 PM
Last Post: probottpric
  Project: “I’m Feeling Lucky” Google Search Truman 31 28,091 Jul-09-2019, 04:20 PM
Last Post: tab_lo_lo
  How to use BeautifulSoup to parse google search results DevinGP 16 21,181 Dec-22-2017, 10:23 PM
Last Post: snippsat
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 3,587 Nov-03-2017, 08:41 PM
Last Post: metulburr
  Create Dictionary List (From a webpage dropdown) for Comparison to a CSV File Guttmann 5 5,852 Mar-31-2017, 01:29 AM
Last Post: Guttmann

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020