Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 webscraping - failing to extract specific text from data.gov
#1
Wanted to extract how many data sets are on 'https://catalog.data.gov/dataset#sec-organization_type'.

The HTML file was:
<body>
...
<div class="new-results">

<!-- Snippet snippets/search_result_text.html start -->

184,298 datasets found
<!-- Snippet snippets/search_result_text.html end -->

</div>

I used this python code:
from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
for i in link:
    print(i.text)
I don't know where the problem is
Quote
#2
from lxml import html
import requests
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
doc = html.fromstring(response.text)
link = doc.cssselect('div.new-results')
print(link[0].text_content().strip())

or using BeautifulSoup and lxml as parser

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.find('div', {'class':'new-results'})
print(div.text.strip())
or

import requests
from bs4 import BeautifulSoup
response = requests.get('https://catalog.data.gov/dataset#sec-organization_type')
soup = BeautifulSoup(response.text, 'lxml')
div = soup.select('div.new-results')
print(div[0].text.strip())
Quote
#3
Thanks a lot Buran!
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Cannot Extract data through charts online AgileAVS 0 114 Feb-01-2020, 01:47 PM
Last Post: AgileAVS
  Web crawler extracting specific text from HTML lewdow 1 651 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Extract text from tag content using regular expression Pavel_47 8 449 Nov-25-2019, 03:17 PM
Last Post: buran
  Extract data from a webpage cycloneseb 4 323 Nov-12-2019, 05:25 PM
Last Post: snippsat
  Cannot extract data from the next pages nazmulfinance 4 275 Nov-11-2019, 08:15 PM
Last Post: nazmulfinance
  Help with basic webscraping Captain_Snuggle 2 240 Nov-07-2019, 08:07 PM
Last Post: kozaizsvemira
  Extract text between bold headlines from HTML CostasG 1 312 Aug-31-2019, 10:53 AM
Last Post: snippsat
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 2,361 Aug-06-2019, 07:23 AM
Last Post: fishhook
  Getting a specific text inside an html with soup mathieugrimbert 9 3,948 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  How to use Python to extract data from Zoho Creator software on the web dan7055 2 736 Jul-05-2019, 05:11 PM
Last Post: DeaD_EyE

Forum Jump:


Users browsing this thread: 1 Guest(s)