Python Forum
webscraping - failing to extract specific text from data.gov
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
webscraping - failing to extract specific text from data.gov
#1
Wanted to extract how many data sets are on 'https://catalog.data.gov/dataset#sec-organization_type'.

The HTML file was:
<body>
...
<div class="new-results">

<!-- Snippet snippets/search_result_text.html start -->

184,298 datasets found
<!-- Snippet snippets/search_result_text.html end -->

</div>

I used this python code:
from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
for i in link:
    print(i.text)
I don't know where the problem is
Reply
#2
from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
print(link[0].text_content().strip())

or using BeautifulSoup and lxml as parser

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.find('div', {'class':'new-results'})
print(div.text.strip())
or

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.select('div.new-results')
print(div[0].text.strip())
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Thanks a lot Buran!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Intro to WebScraping d1rjr03 3 5,605 Dec-16-2024, 02:50 AM
Last Post: bobprogrammer
  Webscraping - loop on first page RikP 0 765 Jul-22-2024, 12:15 PM
Last Post: RikP
  Webscraping news articles by using selenium cate16 7 6,118 Aug-28-2023, 09:58 AM
Last Post: snippsat
  Webscraping with beautifulsoup cormanstan 3 8,169 Aug-24-2023, 11:57 AM
Last Post: snippsat
  Webscraping returning empty table Buuuwq 0 2,550 Dec-09-2022, 10:41 AM
Last Post: Buuuwq
  WebScraping using Selenium library Korgik 0 1,652 Dec-09-2022, 09:51 AM
Last Post: Korgik
  Extract Href URL and Text From List knight2000 2 22,141 Jul-08-2021, 12:53 PM
Last Post: knight2000
  How to get specific TD text via Selenium? euras 3 12,755 May-14-2021, 05:12 PM
Last Post: snippsat
  Extract data from sports betting sites nestor 3 7,832 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  DJANGO Looping Through Context Variable with specific data Taz 0 2,667 Feb-18-2021, 03:52 PM
Last Post: Taz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020