Python Forum
Scrape for html based on url string and output into csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrape for html based on url string and output into csv
#1
Crawl an email from specified website.

I have list of a specific company registration codes in csv format which are updated weekly basis.

I want to crawl all email address from source website which have those specific corresponding company email addresses and put the email address to new csv file.

Source addresses where the email what needs to be crawled looks like this:
http://www.somesite.com/result?country=en&q=1232498 / "q" value equals variable (comapny registration code) with each different page where the email is).

Each address string which needed to crawl is located in csv file (starting from second column with header "regcode")
(source table structure: compname | regcode | othercol1 | othercol2) (columns are separated by semicolon ;)

The email what need to be crawled is located between the html tags of each page:
Output:
<table class="table-info"> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> <tr> <td class="col-1"><div class="col-1-text">E-mail:</div></td> <td class="col-2"><div class="col-2-text"><a href="mailto:[email protected]">[email protected]</a></div></td> </tr> </table>
The crawled email should be put into new csv file, called extracted.csv.

The extracted.csv table structure should be as following:
regcode | email

Explanation: the same company registration code which is used as crawl string, should be put into the new csv file belongside the crawled email address.

This process should be triggered every week and automation should look out for new entires only which are updated in the csv file.
Reply


Messages In This Thread
Scrape for html based on url string and output into csv - by dana - Jan-10-2021, 08:52 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 872 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,924 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,182 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,652 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Pandas tuple list returning html string shansaran 0 1,727 Mar-23-2020, 08:44 PM
Last Post: shansaran
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,379 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,211 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  How do I get rid of the HTML tags in my output? glittergirl 1 3,742 Aug-05-2019, 08:30 PM
Last Post: snippsat
  Formatting Output after Web Scrape yoitspython 2 2,486 Jul-30-2019, 08:39 PM
Last Post: yoitspython
  Basic Syntax/HTML Scrape Questions sungar78 5 3,801 Sep-06-2018, 09:32 PM
Last Post: sungar78

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020