So, i started to read the csv file to get the data like so:
eg. http://www.somesite.com/result?country=en&q=123456789
import csv with open('data.csv', encoding='utf8') as csv_file: csv_reader = csv.DictReader(csv_file, delimiter=';') count = 0 for row in csv_reader: print(row['regcode'])Now, I am clueless how to loop the csv row as request url parameter q.
eg. http://www.somesite.com/result?country=en&q=123456789
(Jan-11-2021, 12:06 PM)snippsat Wrote:(Jan-11-2021, 12:19 AM)dana Wrote: I think I need to use Scrapy, because the csv file contains over 100K rows of data / companies and that means over 100K web requests.Scrapy could possible be used for this.
I would start with a smaller test file and just use basic tool like shown eg BS with lxml(very fast parser C speed).
Then see how long time it take on sample file.
Can also look post there you see i use concurrent.futures to speed it up.
Look at this Post for spilt csv with Pandas and use in then use in Scrapy.
The chuck csv from Pandas can also be used in method that i have talked about.