Learning WebScraping

Prince_Bhatia · (This post was last modified: Aug-28-2017, 11:54 AM by metulburr.)

I am very new to web scraping. I am learning web Scraping online.
website i am trying to scrape is http://econpy.pythonanywhere.com/ex/001.html
i have written a code that will scrape

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = urlopen("http://econpy.pythonanywhere.com/ex/001.html")

def getTitle():
    global url
    bs0bj = BeautifulSoup(url, "html.parser")
    for i in bs0bj.find_all(title="buyer-name"):
        print(i.get_text())
getTitle()


#def getTitle():
#    global url
#    bs0bj = BeautifulSoup(url, "html.parser")
#    for i in bs0bj.find_all(title="buyer-info"):
#        print(i.get_text())
#getTitle()

def getPrice():
    global url
    bs0bj = BeautifulSoup(url, "html.parser")
    for i in bs0bj.find_all("span", {"class":"item-price"}):
        print(i.get_text())
getPrice()

now there are few questions:(please uncomment the codes)
Q1: when i run this code using buyer info it prints price,how to get the data of next pages also?
Q2: why it doesn't print price , when run individually(just buyername)?
Q3: how to write these data into CSV?

**Larz60+** · Aug-25-2017, 12:00 PM

I'd suggest watching these two tutorials: https://python-forum.io/Thread-Web-Scraping-part-1
and https://python-forum.io/Thread-Web-scraping-part-2

***snippsat*** · (This post was last modified: Aug-25-2017, 02:43 PM by snippsat.)

(Aug-25-2017, 11:57 AM)Prince_Bhatia Wrote: Q1: when i run this code using buyer info it prints price,how to get the data of next pages also?

>>> url = "http://econpy.pythonanywhere.com/ex/00{}.html"
>>> for page in range(1,4):
...     print(url.format(page))
...     
http://econpy.pythonanywhere.com/ex/001.html
http://econpy.pythonanywhere.com/ex/002.html
http://econpy.pythonanywhere.com/ex/003.html

Prince_Bhatia Wrote:Q2: why it doesn't print price , when run individually(just buyername)?

It print Price for me when i run it.

Prince_Bhatia Wrote:Q3: how to write these data into CSV?

You have to think of how to separate data.
Can show a exmple where use i zip() on name and price.
Now can item[0] and item[1],be written together in a CSV file.

from bs4 import BeautifulSoup
import requests

def name_price(url):
    soup = BeautifulSoup(url, "html.parser")
    for item in zip(soup.find_all(title="buyer-name"), soup.find_all("span", {"class":"item-price"})):
        print(item[0].text, item[1].text)

if __name__ == '__main__':
    url = 'http://econpy.pythonanywhere.com/ex/001.html'
    url = requests.get(url).content
    name_price(url)

Output:Carson Busses $29.95
Earl E. Byrd $8.37
Patty Cakes $15.26
Derri Anne Connecticut $19.2
.............

Edit:
You see that global is gone,and that url is given as argument.
urllib is gone and use Requests.
If install lxml pip install lxml,change this line to.

soup = BeautifulSoup(url, 'lxml')

Then using lxml which is a faster parser.

Prince_Bhatia · Aug-28-2017, 06:47 AM

Hi,

Thank you for your answer , while i used this quote it prints only links but what if i want the content inside the pages which is same as of 1st page.

How can i find print data inside these pages using this code

Quote:for page in range(1,4):
... print(url.format(page))

for this above mentioned quote.

since i am new to web scraping so i dont have much familiarity with lxml, i am going one by one python libraries

***metulburr*** · Aug-28-2017, 11:58 AM

(Aug-28-2017, 06:47 AM)Prince_Bhatia Wrote: Hi,

Thank you for your answer , while i used this quote it prints only links but what if i want the content inside the pages which is same as of 1st page.

How can i find print data inside these pages using this code

Quote:for page in range(1,4):
... print(url.format(page))

He was showing you how to loop the pages to get links of all. You have to load each "link" with BeautifulSoup still.

Prince_Bhatia · (This post was last modified: Aug-28-2017, 12:24 PM by metulburr.)

hi,

i have wrote this code but it is not working ..for same website

from bs4 import BeautifulSoup
from urllib.request import urlopen

page_url = "http://econpy.pythonanywhere.com/ex/001.html"
new_file = "graphics_cards.csv"
f = open(new_file, "w")
Headers = "Header1, Header2\n"
f.write(Headers)


html = urlopen(page_url)
soup = BeautifulSoup(html, "html.parser")
buyer_info = soup.find_all("div", {"title":"buyer-info"})
for i in buyer_info:
    Header1 = soup.find_all("div", {"title":"buyer_name"})
    Header2 = soup.find_all("spam", {"class":"item-price"})
    print("Header1" + Header1)
    print("Header2"+ Header2)
    f.write(Header1 + Header2+"\n")
f.close()

but it is giving error, without adding any additional code, how to make it work?

***metulburr*** · (This post was last modified: Aug-28-2017, 12:34 PM by metulburr.)

Give your error next time please.

The error i get is from your attempt to concatenate a string object with a bs4.element.ResultSet (AKA list).

If you want to inject the content for printing use format method like this

    print("Header1 {}".format(Header1))
    print("Header2 {}".format(Header2))
    f.write('{} {}\n'.format(Header1, Header2))

note; you have a typo in your search criteria buyer_name

***snippsat*** · (This post was last modified: Aug-28-2017, 02:23 PM by snippsat.)

(Aug-28-2017, 12:22 PM)Prince_Bhatia Wrote: but it is giving error, without adding any additional code, how to make it work?

You can not use soup.find_all() in the loop.
Have to use "for i in buyer_info:" as buyer info has all info.
Have to call text before can write anything.
Never write anything before you have done test print() of the output.
Example getting name.

from bs4 import BeautifulSoup
from urllib.request import urlopen
 
page_url = "http://econpy.pythonanywhere.com/ex/001.html"
new_file = "graphics_cards.csv"
#f = open(new_file, "w")
Headers = "Header1, Header2\n"
#f.write(Headers) 
 
html = urlopen(page_url)
soup = BeautifulSoup(html, "html.parser")
buyer_info = soup.find_all("div", {"title":"buyer-info"})

# Using i and find() with .text
for i in buyer_info:
    print(i.find('div', {"title":"buyer-name"}).text)

Output:Carson Busses
Earl E. Byrd
Patty Cakes
Derri Anne Connecticut
.........

Prince_Bhatia · Aug-29-2017, 07:32 AM

Thank you so much for all help, but now i reached from where i started, how to write this csv or excel?

I know this is painful but since , most of the majority has questions about how to scrape multiple pages and how to write them into excel , one simple example with {quotes} can help all people who browse this forum for help.

can you please amend these codes using above libraries to write them into excel and code to scrape next pages also? You can bring the difference

Prince_Bhatia · (This post was last modified: Aug-29-2017, 09:04 AM by Prince_Bhatia.)

Alright, i reached till here

Quote:from bs4 import BeautifulSoup
from urllib.request import urlopen

page_url = "http://econpy.pythonanywhere.com/ex/001.html"
new_file = "Mynew.csv"
f = open(new_file, "w")
Headers = "Header1, Header2\n"
f.write(Headers)

html = urlopen(page_url)
soup = BeautifulSoup(html, "html.parser")
buyer_info = soup.find_all("div", {"title":"buyer-info"})
for i in buyer_info:
Header1 = i.find("div", {"title":"buyer-name"})
Header2 = i.find("span", {"class":"item-price"})
salmon = print(Header1.get_text())
salam = print(Header2.get_text())
f.write("{}".format(salmon)+ "{}".format(salam))
f.close()

now it throws no error but prints only Header1 and Header2 and NONETYPE
Please dont mind the intended space

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Webscraping news articles by using selenium	cate16	7	3,152	Aug-28-2023, 09:58 AM Last Post: snippsat
	Webscraping with beautifulsoup	cormanstan	3	1,984	Aug-24-2023, 11:57 AM Last Post: snippsat
	Webscraping returning empty table	Buuuwq	0	1,402	Dec-09-2022, 10:41 AM Last Post: Buuuwq
	WebScraping using Selenium library	Korgik	0	1,049	Dec-09-2022, 09:51 AM Last Post: Korgik
	How to get rid of numerical tokens in output (webscraping issue)?	jps2020	0	1,956	Oct-26-2020, 05:37 PM Last Post: jps2020
	Python Webscraping with a Login Website	warriordazza	0	2,610	Jun-07-2020, 07:04 AM Last Post: warriordazza
	Help with basic webscraping	Captain_Snuggle	2	3,940	Nov-07-2019, 08:07 PM Last Post: kozaizsvemira
	Can't Resolve Webscraping AttributeError	Hass	1	2,316	Jan-15-2019, 09:36 PM Last Post: nilamo
	How to exclude certain links while webscraping basis on keywords	Prince_Bhatia	0	3,247	Oct-31-2018, 07:00 AM Last Post: Prince_Bhatia
	Webscraping homework	Ghigo1995	1	2,652	Sep-23-2018, 07:36 PM Last Post: nilamo

Learning WebScraping

User Panel Messages

Announcements