Jun-13-2020, 10:50 AM
Hello to all
a bit of Context:
I have to do a deadly repetitive task for a colleague that begging Monday.
Make a list of Search results in Google Search in Incognito/Secret mode. 300 results for 17 locations, for each Link, title, short description. ARGGHHHHH
I want also to help my colleague not to explode.
What I did until now:
I tried to use Selenium a first time, wired errors (I will come back to that after) occurred. I switched to fake_useragent and BeautifulSoup.
The code is working but I don't know if it is possible to implement location and Incognito mode.
Here is the code:
I tried to go back to Selenium
Here is the code:
But I get this Error Message.
this ligne
None of the solution were working
If you can find something to help I will be extremely happy.
a bit of Context:
I have to do a deadly repetitive task for a colleague that begging Monday.
Make a list of Search results in Google Search in Incognito/Secret mode. 300 results for 17 locations, for each Link, title, short description. ARGGHHHHH
I want also to help my colleague not to explode.
What I did until now:
I tried to use Selenium a first time, wired errors (I will come back to that after) occurred. I switched to fake_useragent and BeautifulSoup.
The code is working but I don't know if it is possible to implement location and Incognito mode.
Here is the code:
import urllib import csv import requests from fake_useragent import UserAgent from bs4 import BeautifulSoup import re csv_list = [["順位", "タイトル", "要約", "リンク", "関連キーワード"]] query = "'tour eifelle'" query = urllib.parse.quote_plus(query) # Format into URL encoding number_result = 20 ua = UserAgent() google_url = "https://www.google.com/search?q=" + query + "&num=" + str(number_result) response = requests.get(google_url, {"User-Agent": ua.random}) soup = BeautifulSoup(response.text, "html.parser") result_div = soup.find_all('div', attrs = {'class': 'ZINbbc'}) links = [] titles = [] descriptions = [] link2= "" for r in result_div: # Checks if each element is present, else, raise exception try: link = r.find('a', href = True) title = r.find('div', attrs={'class':'vvjwJb'}).get_text() description = r.find('div', attrs={'class':'s3v9rd'}).get_text() # Check to make sure everything is present before appending if link != '' and title != '' and description != '': link3= link['href'].lstrip('/url?q=') link2=re.sub(r'&sa.*',"",link3) links.append(link2) titles.append(title) descriptions.append(description) # Next loop if one element is not present except: continue #to_remove = [] #clean_links = [] #for i, l in enumerate(links): # clean = re.search('\/url\?q\=(.*)\&sa',l) # Anything that doesn't fit the above pattern will be removed # if clean is None: # to_remove.append(i) # continue # clean_links.append(clean.group(1)) # Remove the corresponding titles & descriptions #for x in to_remove: # del titles[x] # del descriptions[x] for i in range(len(titles)): add_list=[i+1,titles[i],descriptions[i],links[i]] csv_list.append(add_list) # タイトルリストをcsvに保存 with open('Search_word.csv','w',encoding="utf-8_sig") as f: writecsv = csv.writer(f, lineterminator='\n') writecsv.writerows(csv_list) #links #titles #descriptionsThen After that
I tried to go back to Selenium
Here is the code:
import csv import time # スリープを使うために必要 from selenium import webdriver # Webブラウザを自動操作する(python -m pip install selenium) import chromedriver_binary # パスを通すためのコード def ranking(driver): i = 1 # ループ番号、ページ番号を定義 title_list = [] # タイトルを格納する空リストを用意 link_list = [] # URLを格納する空リストを用意 summary_list = [] RelatedKeywords = [] # 現在のページが指定した最大分析ページを超えるまでループする while i <= i_max: # タイトルとリンクはclass="r"に入っている class_group = driver.find_elements_by_class_name('r') class_group1 = driver.find_elements_by_class_name('s') class_group2 = driver.find_elements_by_class_name('nVcaUb') # タイトルとリンクを抽出しリストに追加するforループ for elem in class_group: title_list.append(elem.find_element_by_class_name('LC20lb').text) # タイトル(class="LC20lb") link_list.append(elem.find_element_by_tag_name('a').get_attribute('href')) # リンク(aタグのhref属性) for elem in class_group1: summary_list.append(elem.find_element_by_class_name('st').text) # リンク(aタグのhref属性) for elem in class_group2: RelatedKeywords.append(elem.text) # リンク(aタグのhref属性) # 「次へ」は1つしかないが、あえてelementsで複数検索。空のリストであれば最終ページの意味になる。 if driver.find_elements_by_id('pnnext') == []: i = i_max + 1 else: # 次ページのURLはid="pnnext"のhref属性 next_page = driver.find_element_by_id('pnnext').get_attribute('href') driver.get(next_page) # 次ページへ遷移する i = i + 1 # iを更新 time.sleep(3) # 3秒間待機 return title_list, link_list, summary_list, RelatedKeywords # タイトルとリンクのリストを戻り値に指定 # driver = webdriver.Chrome() # Chromeを準備 # サンプルのHTMLを開く driver.get('https://www.google.com/') # Googleを開く i_max = 5 # 最大何ページまで分析するかを定義 search = driver.find_element_by_name('q') # HTML内で検索ボックス(name='q')を指定する search.send_keys('Test blender') # 検索ワードを送信する search.submit() # 検索を実行 time.sleep(1.5) # 1.5秒間待機 # ranking関数を実行してタイトルとURLリストを取得する title, link, summary, RelatedKeywords = ranking(driver) csv_list = [["順位", "タイトル", "要約", "リンク", "関連キーワード"]] for i in range(len(title)): add_list=[i+1,title[i],summary[i],link[i]] csv_list.append(add_list) # タイトルリストをcsvに保存 with open('Search_word.csv','w',encoding="utf-8_sig") as f: writecsv = csv.writer(f, lineterminator='\n') writecsv.writerows(csv_list) driver.quit()I specified the path
Quote:C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary
But I get this Error Message.
Error:Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 22:45:29) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>
= RESTART: C:\Users\name\Desktop\B\Python\Recherche de mots\Cherche de mots.py
Traceback (most recent call last):
File "C:\Users\name\Desktop\B\Python\Recherche de mots\Cherche de mots.py", line 49, in <module>
driver = webdriver.Chrome() # Chromeを準備
File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 76, in __init__
RemoteWebDriver.__init__(
File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
After taht I tried to specify the path directly in the codethis ligne
driver = webdriver.Chrome()But I encounter an other problem,
driver = webdriver.Chrome(r'C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary')
Error: File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 72, in start
self.process = subprocess.Popen(cmd, env=self.env,
File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 指定されたファイルが見つかりません。
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Name\Desktop\Bilhaud\Python\Recherche de mots\Cherche de mots.py", line 49, in <module>
driver = webdriver.Chrome(r'C:\Users\Name\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\chromedriver_binary')
File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
self.service.start()
File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver_binary' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
I tried all the solutions here, None of the solution were working
If you can find something to help I will be extremely happy.