Scrape for html based on url string and output into csv

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

Scrape for html based on url string and output into csv

dana
Programmer named Tim

Posts: 8

Threads: 1

Joined: Jan 2021

Reputation: 0

Jan-11-2021, 11:49 PM (This post was last modified: Jan-11-2021, 11:50 PM by dana.)

So, i started to read the csv file to get the data like so:

import csv

with open('data.csv', encoding='utf8') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=';')
    
    count = 0

    for row in csv_reader:
        print(row['regcode'])

Now, I am clueless how to loop the csv row as request url parameter q.

eg. http://www.somesite.com/result?country=en&q=123456789

(Jan-11-2021, 12:06 PM)snippsat Wrote:
(Jan-11-2021, 12:19 AM)dana Wrote: I think I need to use Scrapy, because the csv file contains over 100K rows of data / companies and that means over 100K web requests.
Scrapy could possible be used for this.
I would start with a smaller test file and just use basic tool like shown eg BS with lxml(very fast parser C speed).
Then see how long time it take on sample file.
Can also look post there you see i use concurrent.futures to speed it up.

Look at this Post for spilt csv with Pandas and use in then use in Scrapy.
The chuck csv from Pandas can also be used in method that i have talked about.

Find

Messages In This Thread

Scrape for html based on url string and output into csv - by dana - Jan-10-2021, 08:52 PM

RE: Scrape for html based on url string and output into csv - by snippsat - Jan-10-2021, 09:58 PM

RE: Scrape for html based on url string and output into csv - by dana - Jan-11-2021, 12:19 AM

RE: Scrape for html based on url string and output into csv - by snippsat - Jan-11-2021, 12:06 PM

RE: Scrape for html based on url string and output into csv - by dana - Jan-11-2021, 11:49 PM

RE: Scrape for html based on url string and output into csv - by snippsat - Jan-12-2021, 01:13 AM

RE: Scrape for html based on url string and output into csv - by dana - Jan-12-2021, 02:59 AM

RE: Scrape for html based on url string and output into csv - by snippsat - Jan-12-2021, 03:34 AM

RE: Scrape for html based on url string and output into csv - by dana - Jan-12-2021, 10:10 AM

RE: Scrape for html based on url string and output into csv - by snippsat - Jan-12-2021, 11:37 AM

RE: Scrape for html based on url string and output into csv - by dana - Jan-12-2021, 08:11 PM

RE: Scrape for html based on url string and output into csv - by dana - Jan-12-2021, 11:48 PM

RE: Scrape for html based on url string and output into csv - by dana - Jan-13-2021, 01:44 PM

RE: Scrape for html based on url string and output into csv - by snippsat - Jan-13-2021, 03:52 PM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Trying to scrape data from HTML with no identifiers	pythonpaul32	2	915	Dec-02-2023, 03:42 AM Last Post: pythonpaul32
	Python Obstacles \| Kung-Fu \| Full File HTML Document Scrape and Store it in MariaDB	BrandonKastning	5	2,965	Dec-29-2021, 02:26 AM Last Post: BrandonKastning
	Python Obstacles \| Karate \| HTML/Scrape Specific Tag and Store it in MariaDB	BrandonKastning	8	3,232	Nov-22-2021, 01:38 AM Last Post: BrandonKastning
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,701	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Pandas tuple list returning html string	shansaran	0	1,755	Mar-23-2020, 08:44 PM Last Post: shansaran
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,402	Mar-22-2020, 06:10 AM Last Post: BrandonKastning
	scrape data 1 go to next page scrape data 2 and so on	alkaline3	6	5,279	Mar-13-2020, 07:59 PM Last Post: alkaline3
	How do I get rid of the HTML tags in my output?	glittergirl	1	3,763	Aug-05-2019, 08:30 PM Last Post: snippsat
	Formatting Output after Web Scrape	yoitspython	2	2,506	Jul-30-2019, 08:39 PM Last Post: yoitspython
	Basic Syntax/HTML Scrape Questions	sungar78	5	3,839	Sep-06-2018, 09:32 PM Last Post: sungar78

Users browsing this thread: 4 Guest(s)

View a Printable Version

Scrape for html based on url string and output into csv

User Panel Messages

Announcements