webscraping - failing to extract specific text from data.gov

rontar · (This post was last modified: May-18-2018, 08:27 AM by rontar.)

Wanted to extract how many data sets are on 'https://catalog.data.gov/dataset#sec-organization_type'.

The HTML file was:
<body>
...
<div class="new-results">



184,298 datasets found


</div>

I used this python code:

from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
for i in link:
    print(i.text)

I don't know where the problem is

**buran** · May-18-2018, 08:38 AM

from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
print(link[0].text_content().strip())

or using BeautifulSoup and lxml as parser

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.find('div', {'class':'new-results'})
print(div.text.strip())

or

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.select('div.new-results')
print(div[0].text.strip())

rontar · May-19-2018, 08:01 AM

Thanks a lot Buran!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Intro to WebScraping	d1rjr03	3	5,605	Dec-16-2024, 02:50 AM Last Post: bobprogrammer
	Webscraping - loop on first page	RikP	0	765	Jul-22-2024, 12:15 PM Last Post: RikP
	Webscraping news articles by using selenium	cate16	7	6,118	Aug-28-2023, 09:58 AM Last Post: snippsat
	Webscraping with beautifulsoup	cormanstan	3	8,169	Aug-24-2023, 11:57 AM Last Post: snippsat
	Webscraping returning empty table	Buuuwq	0	2,550	Dec-09-2022, 10:41 AM Last Post: Buuuwq
	WebScraping using Selenium library	Korgik	0	1,652	Dec-09-2022, 09:51 AM Last Post: Korgik
	Extract Href URL and Text From List	knight2000	2	22,141	Jul-08-2021, 12:53 PM Last Post: knight2000
	How to get specific TD text via Selenium?	euras	3	12,755	May-14-2021, 05:12 PM Last Post: snippsat
	Extract data from sports betting sites	nestor	3	7,832	Mar-30-2021, 04:37 PM Last Post: Larz60+
	DJANGO Looping Through Context Variable with specific data	Taz	0	2,667	Feb-18-2021, 03:52 PM Last Post: Taz

webscraping - failing to extract specific text from data.gov

User Panel Messages

Announcements