webscraping - failing to extract specific text from data.gov

rontar · (This post was last modified: May-18-2018, 08:27 AM by rontar.)

Wanted to extract how many data sets are on 'https://catalog.data.gov/dataset#sec-organization_type'.

The HTML file was:
<body>
...
<div class="new-results">



184,298 datasets found


</div>

I used this python code:

from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
for i in link:
    print(i.text)

I don't know where the problem is

**buran** · May-18-2018, 08:38 AM

from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
print(link[0].text_content().strip())

or using BeautifulSoup and lxml as parser

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.find('div', {'class':'new-results'})
print(div.text.strip())

or

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.select('div.new-results')
print(div[0].text.strip())

rontar · May-19-2018, 08:01 AM

Thanks a lot Buran!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Webscraping news articles by using selenium	cate16	7	3,144	Aug-28-2023, 09:58 AM Last Post: snippsat
	Webscraping with beautifulsoup	cormanstan	3	1,979	Aug-24-2023, 11:57 AM Last Post: snippsat
	Webscraping returning empty table	Buuuwq	0	1,402	Dec-09-2022, 10:41 AM Last Post: Buuuwq
	WebScraping using Selenium library	Korgik	0	1,046	Dec-09-2022, 09:51 AM Last Post: Korgik
	Extract Href URL and Text From List	knight2000	2	9,059	Jul-08-2021, 12:53 PM Last Post: knight2000
	How to get specific TD text via Selenium?	euras	3	8,821	May-14-2021, 05:12 PM Last Post: snippsat
	Extract data from sports betting sites	nestor	3	5,652	Mar-30-2021, 04:37 PM Last Post: Larz60+
	DJANGO Looping Through Context Variable with specific data	Taz	0	1,832	Feb-18-2021, 03:52 PM Last Post: Taz
	How to get rid of numerical tokens in output (webscraping issue)?	jps2020	0	1,949	Oct-26-2020, 05:37 PM Last Post: jps2020
	Extract data from a table	Bob_M	3	2,691	Aug-14-2020, 03:36 PM Last Post: Bob_M

webscraping - failing to extract specific text from data.gov

User Panel Messages

Announcements