With Selenium create a google Search list in Incognito mode withe specific location,

tsurubaso · Jun-13-2020, 10:50 AM

Hello to all

a bit of Context:

I have to do a deadly repetitive task for a colleague that begging Monday.
Make a list of Search results in Google Search in Incognito/Secret mode. 300 results for 17 locations, for each Link, title, short description. ARGGHHHHH
I want also to help my colleague not to explode.

What I did until now:

I tried to use Selenium a first time, wired errors (I will come back to that after) occurred. I switched to fake_useragent and BeautifulSoup.
The code is working but I don't know if it is possible to implement location and Incognito mode.
Here is the code:

import urllib
import csv

import requests
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

import re


csv_list = [["順位", "タイトル", "要約", "リンク", "関連キーワード"]]

query = "'tour eifelle'"
query = urllib.parse.quote_plus(query) # Format into URL encoding
number_result = 20



ua = UserAgent()

google_url = "https://www.google.com/search?q=" + query + "&num=" + str(number_result)
response = requests.get(google_url, {"User-Agent": ua.random})
soup = BeautifulSoup(response.text, "html.parser")

result_div = soup.find_all('div', attrs = {'class': 'ZINbbc'})


links = []
titles = []
descriptions = []
link2= ""
for r in result_div:
    # Checks if each element is present, else, raise exception
    try:
        link = r.find('a', href = True)
        title = r.find('div', attrs={'class':'vvjwJb'}).get_text()
        description = r.find('div', attrs={'class':'s3v9rd'}).get_text()
        
        # Check to make sure everything is present before appending
        if link != '' and title != '' and description != '': 
            link3= link['href'].lstrip('/url?q=')
            link2=re.sub(r'&sa.*',"",link3)
            links.append(link2)
            titles.append(title)
            descriptions.append(description)
    # Next loop if one element is not present
    except:
        continue


#to_remove = []
#clean_links = []
#for i, l in enumerate(links):
#    clean = re.search('\/url\?q\=(.*)\&sa',l)

    # Anything that doesn't fit the above pattern will be removed
#    if clean is None:
#        to_remove.append(i)
#        continue
#    clean_links.append(clean.group(1))

# Remove the corresponding titles & descriptions
#for x in to_remove:
#    del titles[x]
#    del descriptions[x]





for i in range(len(titles)):
    add_list=[i+1,titles[i],descriptions[i],links[i]]
    csv_list.append(add_list)

# タイトルリストをcsvに保存

with open('Search_word.csv','w',encoding="utf-8_sig") as f:
    writecsv = csv.writer(f, lineterminator='\n')
    writecsv.writerows(csv_list)


#links 
#titles 
#descriptions

Then After that
I tried to go back to Selenium

Here is the code:

import csv
import time  # スリープを使うために必要
from selenium import webdriver  # Webブラウザを自動操作する（python -m pip install selenium)
import chromedriver_binary  # パスを通すためのコード

def ranking(driver):
    i = 1  # ループ番号、ページ番号を定義

    title_list = []  # タイトルを格納する空リストを用意
    link_list = []  # URLを格納する空リストを用意
    summary_list = []
    RelatedKeywords = []

    # 現在のページが指定した最大分析ページを超えるまでループする
    while i <= i_max:
        # タイトルとリンクはclass="r"に入っている
        class_group = driver.find_elements_by_class_name('r')
        class_group1 = driver.find_elements_by_class_name('s')
        class_group2 = driver.find_elements_by_class_name('nVcaUb')
        # タイトルとリンクを抽出しリストに追加するforループ
        for elem in class_group:
            title_list.append(elem.find_element_by_class_name('LC20lb').text)  # タイトル(class="LC20lb")
            link_list.append(elem.find_element_by_tag_name('a').get_attribute('href'))  # リンク(aタグのhref属性)

        for elem in class_group1:
            summary_list.append(elem.find_element_by_class_name('st').text)  # リンク(aタグのhref属性)

        for elem in class_group2:
            RelatedKeywords.append(elem.text)  # リンク(aタグのhref属性)

        # 「次へ」は1つしかないが、あえてelementsで複数検索。空のリストであれば最終ページの意味になる。
        if driver.find_elements_by_id('pnnext') == []:
            i = i_max + 1
        else:
            # 次ページのURLはid="pnnext"のhref属性
            next_page = driver.find_element_by_id('pnnext').get_attribute('href')
            driver.get(next_page)  # 次ページへ遷移する
            i = i + 1  # iを更新
            time.sleep(3)  # 3秒間待機

    return title_list, link_list, summary_list, RelatedKeywords  # タイトルとリンクのリストを戻り値に指定


           
#



driver = webdriver.Chrome()  # Chromeを準備
# サンプルのHTMLを開く
driver.get('https://www.google.com/')  # Googleを開く
i_max = 5  # 最大何ページまで分析するかを定義
search = driver.find_element_by_name('q')  # HTML内で検索ボックス(name='q')を指定する
search.send_keys('Test blender')  # 検索ワードを送信する
search.submit()  # 検索を実行
time.sleep(1.5)  # 1.5秒間待機

# ranking関数を実行してタイトルとURLリストを取得する
title, link, summary, RelatedKeywords = ranking(driver)


csv_list = [["順位", "タイトル", "要約", "リンク", "関連キーワード"]]

for i in range(len(title)):
    add_list=[i+1,title[i],summary[i],link[i]]
    csv_list.append(add_list)

# タイトルリストをcsvに保存

with open('Search_word.csv','w',encoding="utf-8_sig") as f:
    writecsv = csv.writer(f, lineterminator='\n')
    writecsv.writerows(csv_list)

driver.quit()

I specified the path

Quote:C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary

But I get this Error Message.

Error:Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 22:45:29) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
= RESTART: C:\Users\name\Desktop\B\Python\Recherche de mots\Cherche de mots.py
Traceback (most recent call last):
  File "C:\Users\name\Desktop\B\Python\Recherche de mots\Cherche de mots.py", line 49, in <module>
    driver = webdriver.Chrome()  # Chromeを準備
  File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 76, in __init__
    RemoteWebDriver.__init__(
  File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary

After taht I tried to specify the path directly in the code

this ligne

driver = webdriver.Chrome()

But I encounter an other problem,

driver = webdriver.Chrome(r'C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary')

Error:  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 72, in start
    self.process = subprocess.Popen(cmd, env=self.env,
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 指定されたファイルが見つかりません。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Name\Desktop\Bilhaud\Python\Recherche de mots\Cherche de mots.py", line 49, in <module>
    driver = webdriver.Chrome(r'C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary')
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
    self.service.start()
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
    raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver_binary' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

I tried all the solutions here,
None of the solution were working

If you can find something to help I will be extremely happy.

mlieqo · Jun-14-2020, 07:54 AM

I am not familiar with chromdriver-binary package, just try downloading chromedriver from here: https://chromedriver.chromium.org/downloads and then

browser = webdriver.Chrome(executable_path=r"C:\path\to\chromedriver.exe")

**Yoriz** · Jun-14-2020, 08:33 AM

Same error discussed in the thread WebDriverException: 'chromedriver' executable needs to be in PATH

tsurubaso · Jun-15-2020, 12:34 PM

mlieqo, Believe me I tried.
Yoriz, thank you also for the Link,
I don<t know working on a Japanese computer change parameters, I am just not finding solutions. Spent 2 days on this. I have to take an other route.
Thanks.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Disable checkbox of google maps markers/labels using selenium	erickkill	0	1,904	Nov-25-2021, 12:20 PM Last Post: erickkill
	Selenium innerHTML list, print specific value	denis22934	2	5,351	Jun-14-2021, 04:59 AM Last Post: denis22934
	How to get specific TD text via Selenium?	euras	3	12,637	May-14-2021, 05:12 PM Last Post: snippsat
	help ! selenium and google sheet	puttimet38	2	3,362	Mar-12-2021, 09:50 AM Last Post: puttimet38
	How to use Selenium on EdgeHTML, when having WebDrivers in other location?	euras	2	2,872	Feb-03-2021, 06:02 PM Last Post: euras
	Selenium google login	probottpric	0	2,968	Oct-09-2020, 04:19 PM Last Post: probottpric
	Project: “I’m Feeling Lucky” Google Search	Truman	31	35,255	Jul-09-2019, 04:20 PM Last Post: tab_lo_lo
	How to use BeautifulSoup to parse google search results	DevinGP	16	25,689	Dec-22-2017, 10:23 PM Last Post: snippsat
	Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program.	AcszE	1	4,515	Nov-03-2017, 08:41 PM Last Post: metulburr
	Create Dictionary List (From a webpage dropdown) for Comparison to a CSV File	Guttmann	5	7,284	Mar-31-2017, 01:29 AM Last Post: Guttmann

With Selenium create a google Search list in Incognito mode withe specific location,

User Panel Messages

Announcements