Python Forum
Scrape for html based on url string and output into csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrape for html based on url string and output into csv
#5
So, i started to read the csv file to get the data like so:

import csv

with open('data.csv', encoding='utf8') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=';')
    
    count = 0

    for row in csv_reader:
        print(row['regcode'])
Now, I am clueless how to loop the csv row as request url parameter q.

eg. http://www.somesite.com/result?country=en&q=123456789



(Jan-11-2021, 12:06 PM)snippsat Wrote:
(Jan-11-2021, 12:19 AM)dana Wrote: I think I need to use Scrapy, because the csv file contains over 100K rows of data / companies and that means over 100K web requests.
Scrapy could possible be used for this.
I would start with a smaller test file and just use basic tool like shown eg BS with lxml(very fast parser C speed).
Then see how long time it take on sample file.
Can also look post there you see i use concurrent.futures to speed it up.

Look at this Post for spilt csv with Pandas and use in then use in Scrapy.
The chuck csv from Pandas can also be used in method that i have talked about.
Reply


Messages In This Thread
RE: Scrape for html based on url string and output into csv - by dana - Jan-11-2021, 11:49 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 915 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,965 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,232 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,701 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Pandas tuple list returning html string shansaran 0 1,755 Mar-23-2020, 08:44 PM
Last Post: shansaran
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,402 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,279 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  How do I get rid of the HTML tags in my output? glittergirl 1 3,763 Aug-05-2019, 08:30 PM
Last Post: snippsat
  Formatting Output after Web Scrape yoitspython 2 2,506 Jul-30-2019, 08:39 PM
Last Post: yoitspython
  Basic Syntax/HTML Scrape Questions sungar78 5 3,839 Sep-06-2018, 09:32 PM
Last Post: sungar78

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020