Python Forum
webscraping - failing to extract specific text from data.gov
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
webscraping - failing to extract specific text from data.gov
#1
Wanted to extract how many data sets are on 'https://catalog.data.gov/dataset#sec-organization_type'.

The HTML file was:
<body>
...
<div class="new-results">

<!-- Snippet snippets/search_result_text.html start -->

184,298 datasets found
<!-- Snippet snippets/search_result_text.html end -->

</div>

I used this python code:
from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
for i in link:
    print(i.text)
I don't know where the problem is
Reply
#2
from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
print(link[0].text_content().strip())

or using BeautifulSoup and lxml as parser

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.find('div', {'class':'new-results'})
print(div.text.strip())
or

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.select('div.new-results')
print(div[0].text.strip())
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Thanks a lot Buran!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Webscraping news articles by using selenium cate16 7 2,878 Aug-28-2023, 09:58 AM
Last Post: snippsat
  Webscraping with beautifulsoup cormanstan 3 1,795 Aug-24-2023, 11:57 AM
Last Post: snippsat
  Webscraping returning empty table Buuuwq 0 1,332 Dec-09-2022, 10:41 AM
Last Post: Buuuwq
  WebScraping using Selenium library Korgik 0 1,010 Dec-09-2022, 09:51 AM
Last Post: Korgik
  Extract Href URL and Text From List knight2000 2 8,437 Jul-08-2021, 12:53 PM
Last Post: knight2000
  How to get specific TD text via Selenium? euras 3 8,575 May-14-2021, 05:12 PM
Last Post: snippsat
  Extract data from sports betting sites nestor 3 5,518 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  DJANGO Looping Through Context Variable with specific data Taz 0 1,766 Feb-18-2021, 03:52 PM
Last Post: Taz
  How to get rid of numerical tokens in output (webscraping issue)? jps2020 0 1,899 Oct-26-2020, 05:37 PM
Last Post: jps2020
  Extract data from a table Bob_M 3 2,607 Aug-14-2020, 03:36 PM
Last Post: Bob_M

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020