Python Forum
Not able to fetch data from a webpage
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Not able to fetch data from a webpage
#1
Hi,
I studied the Web Scraping Tutorial 1 and trying some practical tests on a webpage. The html code below is a code segment. I tried to pick some data from this webpage which are bolded in the below text but I am not able to do that. There are many more data in the code given below but I just bold out a few as for example. Is there any method available by using which I can identify the text data available in a web-page.


I used the below python code to pick data -

url_get = requests.get(url)
soup = BeautifulSoup(url_get.content,'lxml')
lenth = len(soup)
for i in range(0,lenth):
print("\n\n")
print(soup.select('div')[i].text)[/i]
in the above code I used 'a', 'span' but no data has been fetched. So I am quit confused now after how to do this in a real time webpage. Please let me know if anyone can help me to sort out this issue and if there are any Study material/Book available to grab this topic deeply.
Reply
#2
Quote:
lenth = len(soup)
for i in range(0,lenth):
Dont do this. Why are you doing that in the first place?

just do
print(soup.select('div').text)
Quote:in the above code I used 'a', 'span' but no data has been fetched.
in the above code you did not search for that.

If you want to search specific data with multiple tags names
soup.find('tag_name_here_like(a)', {'class':'class_name_here'})
Recommended Tutorials:
Reply
#3
Here a quick demo with text you want.
find() and find_all() is using CSS class name to find right tag.
select() is using CSS Selectors to find right tag.
from bs4 import BeautifulSoup

# Simulate a web page
html = '''\
<div class="_6a _5u5j _6b"></div>
<div class="_c24 _50f4">Studied at <a class="profileLink" href="facebook">Govt. City College, Chittagong</a></div>
<div class="_50f8 _2ieq"></div>
<div class="fsm fwn fcg">Past:
  <a class="profileLink" href="facebook">Chittagong Government High School</a>
</div>'''

soup = BeautifulSoup(html, 'lxml')
print(soup.find(class_="_c24 _50f4").text)
print(soup.find(class_="profileLink").text)
print('---------------------------------')
# select is using CSS selector
print(soup.select('div._c24._50f4')[0].text)
print(soup.select('.profileLink')[0].text)
Output:
Studied at Govt. City College, Chittagong Govt. City College, Chittagong --------------------------------- Studied at Govt. City College, Chittagong Govt. City College, Chittagong
Reply
#4
I used this code to pick up the data as you suggested -
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content,'lxml')
print(soup.find(class_="_c24 _50f4").text)
In the 'url' variable I am taking the URL of my practice website, and then getting the code of the website in soup variable.

but I am getting the following error though there are several classes where text is present -
Error:
[b]Traceback (most recent call last): File "/home/csurv_4/PycharmProjects/web_parsing/data_fetching.py", line 18, in <module> print(soup.find(class_= "_c24 _50f4").text) AttributeError: 'NoneType' object has no attribute 'text'[/b]
I want to know the mistake I am doing, and what to do if there are multiple classes with the same name. How I can retrieve the all the data with that class name?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extract data from a webpage cycloneseb 5 2,820 Apr-04-2020, 10:17 AM
Last Post: alekson
  Get data from a webpage Pedroski55 3 2,995 Mar-02-2019, 03:13 AM
Last Post: Pedroski55
  flask requests display data from api on webpage with javacript pascale 0 2,743 Oct-25-2018, 08:30 PM
Last Post: pascale
  How to fetch latitude,longitude from location and save them separately in db(Django2) PrateekG 0 2,611 Jun-21-2018, 04:40 AM
Last Post: PrateekG
  Unable to fetch product url using BeautifulSoup with Python3.6 PrateekG 6 4,147 Jun-05-2018, 05:49 PM
Last Post: PrateekG
  How do I fetch values from db to Select Options using Flask? progShubham 2 17,641 Jul-25-2017, 05:52 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020