How to use BeautifulSoup to parse google search results

How to use BeautifulSoup to parse google search results - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: How to use BeautifulSoup to parse google search results (/thread-7117.html)

Pages: 1 2

How to use BeautifulSoup to parse google search results - DevinGP - Dec-21-2017

I am trying to parse the first page of google search results. Specifically, the Title and the small Summary that is provided. Here is what I have so far:

from urllib.request import urlretrieve
import urllib.parse
from urllib.parse import urlencode, urlparse, parse_qs
import webbrowser
from bs4 import BeautifulSoup
import requests

address = 'https://google.com/#q='
# Default Google search address start
file = open( "OCR.txt", "rt" )
# Open text document that contains the question
word = file.read()
file.close()

myList = [item for item in word.split('\n')]
newString = ' '.join(myList)
# The question is on multiple lines so this joins them together with proper spacing

print(newString)

qstr = urllib.parse.quote_plus(newString)
# Encode the string

newWord = address + qstr
# Combine the base and the encoded query

print(newWord)

source = requests.get(newWord)

soup = BeautifulSoup(source.text, 'lxml')

The part I am stuck on now is going down the HTML path to parse the specific data that I want. Everything I have tried so far has just thrown an error saying that it has no attribute or it just gives back "[]".

I am new to Python and BeautifulSoup so I am not sure the syntax of how to get to where I want. I have found that these are the individual search results in the page:

https://ibb.co/jfRakR

Any help on what to add to parse the Title and Summary of each search result would be MASSIVELY appreciated.

Thank you!

RE: How to use BeautifulSoup to parse google search results - wavic - Dec-21-2017

The URL is like this: https://google.com/search?q=python+hello+world+tutorial

You may add some other options

See this

RE: How to use BeautifulSoup to parse google search results - DevinGP - Dec-21-2017

(Dec-21-2017, 05:04 PM)wavic Wrote: The URL is like this: https://google.com/search?q=python+hello+world+tutorial

You may add some other options

See this

Hello, the issue is not with making the URL, so far I have that working fine. The issue is with BeautifulSoup to parse data from said URL. I do not know the proper syntax on how to use .read() or .read_all() to get to the data that I want. (The title and summary).

RE: How to use BeautifulSoup to parse google search results - nilamo - Dec-21-2017

Does this help?

soup.find_all("div.g")

RE: How to use BeautifulSoup to parse google search results - DevinGP - Dec-21-2017

(Dec-21-2017, 06:36 PM)nilamo Wrote: Does this help?
soup.find_all("div.g")

Hello, thanks for the reply! When I do this:

source = requests.get(newWord)

soup = BeautifulSoup(source.text, 'lxml')

results = soup.find_all("div.g")



print(results)

All it prints is "None". That was the problem I was having as well.

RE: How to use BeautifulSoup to parse google search results - metulburr - Dec-21-2017

Quote:soup.find_all("div.g")

im pretty sure that find_all has no significance in a period, so it is actually searching for <div.g> tag

Im assuming you mean this?

soup.find_all('div', {'class':'g'})

or CSS selector soup.select('.g') i think. I havent checked it for verification.

RE: How to use BeautifulSoup to parse google search results - snippsat - Dec-21-2017

You are really making it difficult for yourself,google use a lot of JavaScript.
JavaScript is rendered in browser,
so when you see div class='g'(browser) it dos not mean that it will be download source(Requests can not render JavaScript).
Can try to use Selenium/PhantomJs,i did a quick test and even using those tool is difficult to parse the mess getting back.

So i would try to avoid parse result from a google search,
start train with something simpler.

RE: How to use BeautifulSoup to parse google search results - DevinGP - Dec-21-2017

(Dec-21-2017, 07:07 PM)metulburr Wrote:
Quote:soup.find_all("div.g")
im pretty sure that find_all has no significance in a period, so it is actually searching for <div.g> tag

Im assuming you mean this?
soup.find_all('div', {'class':'g'})
or CSS selector soup.select('.g') i think. I havent checked it for verification.

Hey! Thank you for your reply! When I try this all it returns is:

"[]"

Thanks for the tip though, this has been really racking my brain.

RE: How to use BeautifulSoup to parse google search results - metulburr - Dec-21-2017

Quote:When I try this all it returns is:

"[]"

Then it probably is using javscript and you are only left with selenium as an option.

I didnt know the results might be javascript though.

RE: How to use BeautifulSoup to parse google search results - DevinGP - Dec-21-2017

(Dec-21-2017, 07:33 PM)metulburr Wrote:
Quote:When I try this all it returns is:

"[]"
Then it probably is using javscript and you are only left with selenium as an option.

I didnt know the results might be javascript though.

Do you mind telling me how I would implement Selenium into my current code or at least pointing me to a tutorial on someone using it to scrape the titles and summaries? Thank you!