Python Forum
How to use BeautifulSoup to parse google search results
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to use BeautifulSoup to parse google search results
#11
Google is using JS a lot. I wrote translation script back in the days and I was wondering why it worked only with the downloaded page.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#12
(Dec-21-2017, 09:10 PM)DevinGP Wrote:
(Dec-21-2017, 07:33 PM)metulburr Wrote: Then it probably is using javscript and you are only left with selenium as an option.

I didnt know the results might be javascript though.

Do you mind telling me how I would implement Selenium into my current code or at least pointing me to a tutorial on someone using it to scrape the titles and summaries? Thank you!

from selenium import webdriver
import time
from bs4 import BeautifulSoup

DRIVERPATH = '/home/metulburr/chromedriver' 

class Data:
    def __init__(self, search):
        self.url = 'https://www.google.com/'
        self.setup_driver(self.url)
        #self.browser.delete_all_cookies()
        self.search = search
        self.handle_search()
        self.get_data()
        time.sleep(1010000000)
        
    def get_data(self):
        soup = BeautifulSoup(self.browser.page_source, 'html.parser')
        divs = soup.find_all('div', {'class':'g'})
        for div in divs:
            print(div.a.text)
            print(div.a['href'])
            desc = div.find('span', {'class':'st'})
            print(desc.text)
        
        
    def handle_search(self):
        self.browser.find_element_by_xpath('//*[@id="lst-ib"]').click()
        self.browser.find_element_by_id("lst-ib").send_keys(self.search)
        time.sleep(1)
        self.browser.find_element_by_xpath('//*[@id="sbtc"]/div[2]/div[2]/div[1]/div/ul/li[7]/div/span[1]/span/input').click()
        time.sleep(1)
        
    def setup_driver(self, url):
        self.browser = webdriver.Chrome(DRIVERPATH)
        self.browser.set_window_position(0,0)
        self.browser.get(self.url)

data = Data('python forum')
data.browser.quit()
Recommended Tutorials:
Reply
#13
The search results are generated with JavaScript and bs4 can't render JavaScript.
Reply
#14
BeautifulSoup doesn't render anything. It parses the file and creates a tree. And gives you methods to search that tree.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#15
(Dec-22-2017, 06:19 AM)RickyWilson Wrote: The search results are generated with JavaScript and bs4 can't render JavaScript.
That's why first use Selenium with PhantomJS,
then give source code to BeautifulSoup for parsing as shown bye metulburr.

time.sleep(1010000000) Sleepy Sleepy
Reply
#16
Quote:time.sleep(1010000000) Sleepy Sleepy
oh yeah i forgot to take that our before posting. I use that to simplify for general answering questions otherwise i do use WebDriverWait/EC/NoSuchElementException etc.
Recommended Tutorials:
Reply
#17
An other one one,i drop to do the search with Chrome/Phantom and use  search?q=
Have added next page search Google_Search('python forum', page=1).
from selenium import webdriver
from bs4 import BeautifulSoup

class Google_Search:
    def __init__(self, search, page=0):
        self.search = search
        self.page = page
        self.url = f'https://www.google.com/search?q={self.search}\
                    &ei=m3w9WuyXNJHMwALelovwAQ&start={str(self.page)+"0"}&sa=N&biw=848&bih=972'
        self.result()

    def result(self):
        browser = webdriver.PhantomJS()
        browser.get(self.url)
        soup = BeautifulSoup(browser.page_source, 'lxml')
        name_link = soup.find_all('h3', class_='r')
        link = soup.find_all('cite')
        for n_link, l in zip(name_link,link):
            print(f'{n_link.text}\n{l.text}')
            print('---------')

if __name__ == '__main__':
    Google_Search('python forum')
    #Google_Search('python forum', page=1)
Output:
Forums | Python.org https://www.python.org/community/forums/ --------- Python Forum https://python-forum.io/ --------- What are the best Python forums to hang out in? : Python - Reddit https://www.reddit.com/.../Python/.../what_are_the_best_python_forums_to_ hang_out_in/ --------- Python Forum | Dream.In.Code www.dreamincode.net/forums/forum/29-python/ --------- Python For Beginners Forum | Codecademy https://www.codecademy.com/en/forums/python-for-beginners --------- Python Syntax Forum | Codecademy https://www.codecademy.com/.../forums/introduction-to-python-6WeG3 --------- Nytt Norsk Python Forum - Scriptingspråk (Python, Perl, Ruby o.l ... https://www.diskusjon.no/index.php?showtopic=828151 --------- Python Programming - Dev Shed Forums forums.devshed.com/python-programming-11/ --------- Python - Raspberry Pi Forums https://www.raspberrypi.org/forums/viewforum.php?f=32 --------- Python - thenewboston Forum https://thenewboston.com/forum/category.php?id=15 ---------
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Unable to convert browser generated xml to parse in BeautifulSoup Nik1811 0 117 Mar-22-2024, 01:37 PM
Last Post: Nik1811
  Using BeautifulSoup And Getting -1 Results knight2000 10 2,806 Mar-07-2023, 02:42 PM
Last Post: snippsat
  Web scraping for search results JOE 7 3,171 May-14-2022, 01:19 PM
Last Post: JOE
  With Selenium create a google Search list in Incognito mode withe specific location, tsurubaso 3 3,185 Jun-15-2020, 12:34 PM
Last Post: tsurubaso
  Wrong number of google results in a date range Val 0 1,818 Mar-15-2020, 02:29 PM
Last Post: Val
  Project: “I’m Feeling Lucky” Google Search Truman 31 28,017 Jul-09-2019, 04:20 PM
Last Post: tab_lo_lo
  Outputing the results of search machine Emmanouil 4 4,986 Nov-07-2016, 05:20 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020