Python Forum
How can i make this code to run faster
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can i make this code to run faster
#1
hi,

I have a csv with around 2 lakhs url to download images from them.I am reading a csv and downloading images:

below is my code:
from bs4 import BeautifulSoup
import requests
import csv
import urllib.request
import os
final_data = []

def get(url, values):
    res = requests.get(url, data=values)
    d = res.status_code
    if d!=200:
        print("Invalid Link")
    else:
        print("Valid link")
    data = res.text
    return data

def readfile():
    with open("./plans.csv", "r") as csvfile:
        reader = csv.reader(csvfile)
        next(reader)
        for row in reader:
            lists = row[0]
            ind = row[1]+".jpg"
            mj = os.path.abspath(ind)
            d = requests.get(lists)
            sublist = []
            try:
                d = urllib.request.urlretrieve(lists, ind)
                sublist.append(mj)
                sublist.append(lists)
            except Exception as e:
                sublist.append(mj)
                sublist.append(e)
            final_data.append(sublist)
    return final_data

def writefiles(alldata, filename):
    with open ("./"+ filename, "w") as csvfile:
        csvfile = csv.writer(csvfile, delimiter=",")
        csvfile.writerow("")
        for i in range(0, len(alldata)):
            csvfile.writerow(alldata[i])

def main():
    readfile()
    writefiles(final_data, "downloads.csv")
main()
 
It is downloading images , but how can i make this code to run fast, like more fast.

any examples or alteration in code is appreciable . Please help
Reply
#2
Multithreading/multiprocessing. But in order to easily do that, I'd suggest starting with reorganizing your code so it populates and consumes queue.Queues.

For example, start by removing all references to the global variable final_data. Try introducing queue.Queue to store the url/local path pairs, as well as whatever readfile() returns (it isn't currently obvious what an mj or lists is). A new function, who's only job is to process urls, can then be used to spawn multiple threads to handle them in parallel.
Reply
#3
Like nilamo said, multithreading is the best bet for a major performance boost. For something more marginal, I made a few edits to remove superfluous lines and use some more performant functions/methods.

from bs4 import BeautifulSoup
import requests
import csv
import urllib.request
import os
 
def get(url, values):
    res = requests.get(url, data=values)
    d = res.status_code
    if d!=200:
        print("Invalid Link")
    else:
        print("Valid link")
    data = res.text
    return data
 
def readfile():
    final_data = []
    with open("./plans.csv", "r") as csvfile:
        reader = csv.reader(csvfile)
        next(reader)
        for row in reader:
            lists = row[0]
            ind = row[1]+".jpg"
            d = requests.get(lists)
            sublist = []
            sublist.append(os.path.abspath(ind)) # Eliminate mj variable
            try:
                d = urllib.request.urlretrieve(lists, ind)
                sublist.append(lists)
            except Exception as e:
                sublist.append(e)
            final_data.append(sublist)

    return final_data
 
def writefiles(alldata, filename):
    with open ("./"+ filename, "a+") as file:
        csvfile = csv.writer(file, delimiter=",")
        csvfile.writerow("")
        csvfile.writerows(alldata) # csvwriter.writerows() instead of loop
 
def main():
    writefiles(readfile(), "downloads.csv")

main()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [PyCharm] Any Features To Make Programming Easier / Faster? digitalmatic7 3 3,306 Jan-19-2018, 05:55 PM
Last Post: metulburr
  How to make my code work with asyncio? DevinGP 0 2,695 Jan-09-2018, 06:21 PM
Last Post: DevinGP

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020