Python Forum
Scrap data from not standarized page?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrap data from not standarized page?
#1
Hi guys,

I would like to scrap and organize data from html document.
I was learning scrapping and presenting data on different site structure(shopping/offers in elements)

I am curious, if something would be doable to scrap and organize data from thousands of documents which are not standarized? What i mean is that sometimes information is on top of the document, sometimes on the bottom, and pretty much always in different area.
Let's say that i would like to get data from "SUMMARY COMPENSATION TABLE" (from both of the files below).
For specific, the one only site it is doable(using indexes, find etc.)

Is there any kind of action which can be done to thousands of files like that? I cannot use specific div or other html-type because every table is named the same (with only different font).

I just don't know how to tell python look for "SUMMARY COMPENSATION TABLE" and get whole data from table below.

Example of page #1
https://www.sec.gov/Archives/edgar/data/...def14a.htm
Example of page #2
https://www.sec.gov/Archives/edgar/data/...def14a.htm

Do you have any thoughts, ideas if it is even doable?
Reply


Messages In This Thread
Scrap data from not standarized page? - by zarize - Nov-20-2019, 02:27 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scrap --Need help Lizardpython 4 954 Oct-01-2023, 11:37 AM
Last Post: Lizardpython
  trying to save data automatically from this page thunderspeed 1 1,973 Sep-19-2021, 04:57 AM
Last Post: ndc85430
  Scraping a page with log in data (security, proxies) iamaghost 0 2,103 Mar-27-2021, 02:56 PM
Last Post: iamaghost
  I tried every way to scrap morningstar financials data without success so far sparkt 2 8,172 Oct-20-2020, 05:43 PM
Last Post: sparkt
  Web scrap multiple pages anilacem_302 3 3,784 Jul-01-2020, 07:50 PM
Last Post: mlieqo
  Need logic on how to scrap 100K URLs goodmind 2 2,571 Jun-29-2020, 09:53 AM
Last Post: goodmind
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,582 Mar-19-2020, 06:13 PM
Last Post: apollo
  Sending data to php page ebolisa 0 1,889 Mar-18-2020, 05:34 PM
Last Post: ebolisa
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,091 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  Scrap a dynamic span hefaz 0 2,660 Mar-07-2020, 02:56 PM
Last Post: hefaz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020