Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Download multiple large json files at once
#1
I'm trying to get the auction data from the world of warcraft API for all realms to be able to scan for certain items posted across all realms.

import webbrowser
import urllib
import urllib.request
import urllib.error
import urllib, json
import re
import os.path
import time
import threading
from multiprocessing import Pool
from multiprocessing.pool import ThreadPool
from concurrent.futures.thread import ThreadPoolExecutor
from queue import Queue
from threading import Thread
import concurrent.futures
import multiprocessing

q = Queue()
url = ""
apiUrls = []
apiRealms = []
curApi = 0
realms = []
realm = ""
items = [141582, 141583, 141585, 141581, 141580, 141590, 141589, 141588, 141587, 141564, 141565, 141566, 141571, 141567,
         141576, 141577, 141578, 141570, 141569, 141568, 141572, 141573, 141574, 141575, 141579]
itemnames = ["Fran's Intractable Loop", "Sameed's Vision Ring", "Six-Feather Fan", "Demar's Band of Amore",
             "Vastly Oversized Ring", "Cloak of Martayl Oceanstrider", "Treia's Handcrafted Shroud",
             "Talisman of Jaimil Lightheart", "Queen Yh'saerie's Pendant", "Telubis Binding of Patience",
             "Mir's Enthralling Grasp", "Serrinne's Maleficent Habit", "Mavanah's Shifting Wristguards",
             "Cyno's Mantle of Sin", "Aethrynn's Everwarm Chestplate", "Fists of Thane Kray-Tan",
             "Claud's War-Ravaged Boots", "Cainen's Preeminent Chestguard", "Samnoh's Exceptional Leggings",
             "Boughs of Archdruid Van-yali", "Geta of Tay'shute", "Shokell's Grim Cinch", "Ulfgor's Greaves of Bravery",
             "Gorrog's Serene Gaze", "Welded Hardskin Helmet"]
print(len(items))

# Import realms
with open('C:/test/RealmList.txt') as f:
    realms = f.read().splitlines()
    print("Realms: " + str(realms))
    f.close()


# realms = ["Shu'halo", "Eitrigg", "Stormrage", "Moonguard"]



def scanRealm(realmb):
    startTime = time.time()
    print("Scanning " + realmb)
    url = 'https://us.api.blizzard.com/wow/auction/data/' + realmb + '?locale=en_US&access_token=hidden'
    print(url)
    # Get auction json url
    with urllib.request.urlopen(url) as response:
        html = response.read()
        html = html.decode('utf-8')

    # Get url from string
    urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', html)
    print("Got Auction Api url for realm " + realmb + ": ", urls[0])
    jsonurl = urllib.request.urlopen(urls[0])
    print("request")
    text = json.load(jsonurl.read())
    print("load")
    # write to txt
    with open('c:/Test/' + realmb + '.txt', 'w') as f:
        f.write(str(text).replace("{'auc", "\n{'auc"))
    print("Completed scanning " + realmb + " in " + str(time.time() - startTime) + " seconds.")
    f.close()
    return 1


def getJsonUrl(realmb):
    startTime = time.time()
    print("Scanning " + realmb)
    url = 'https://us.api.blizzard.com/wow/auction/data/' + realmb + '?locale=en_US&access_token=hidden'
    print(url)
    # Get auction json url
    with urllib.request.urlopen(url) as response:
        html = response.read()
        print(html)
        html = html.decode('utf-8')

    # Get url from string
    urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', html)
    apiUrls.append(urls[0])
    apiRealms.append(realmb)
    print("Got Auction Api url for realm " + realmb + ": ", urls[0])

def downloadJson(realmb):
    print("request")
    response = urllib.request.urlopen(realmb)
    # print(response.read())
    text = json.load(response.read())
    print("load")
    # write to txt
    with open('c:/Test/' + apiRealms[curApi] + '.txt', 'w') as f:
      f.write(str(text).replace("{'auc", "\n{'auc"))
    print("Completed scanning " + realmb)
    f.close()
    curApi = curApi + 1
    return 1

starttime = time.time()
#Get the url's for the json files
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
    pages = executor.map(getJsonUrl, realms)
print("Completed in " + str(time.time() - starttime) + " seconds.")

#download json
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
    pages = executor.map(downloadJson, apiUrls)
print("Completed in " + str(time.time() - starttime) + " seconds.")
Initially I had it downloading one realm at a time, which worked fine but took almost 2 hours to complete. I'm trying to use threading to be able to scan multiple realms at once to speed it up a lot. The scanRealm function is the one that'll do one realm at a time and work fine. getJsonUrl appears to work fine and outputs urls to the json files that I need, for example http://auction-api-us.worldofwarcraft.co...tions.json. The downloadJson function is where I believe things are going wrong. It never seems to get to the point where it prints "load". No files are ever created or anything and after fiddling around looking for solutions for the past few hours I'm stumped.

Sorry for the mess of a code, I'm no professional at python and am mostly just trying to scrape up something functional to improve over time.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Random access binary files with mmap - drastically slows with big files danart 1 211 Jun-17-2019, 10:45 AM
Last Post: danart
  Encoding problems on multiple files in one folder NikolajKorsgaard 5 427 Jun-11-2019, 03:39 AM
Last Post: micseydel
  Analyzing large text file with nltk.corpus (stopwords ) Drone4four 9 784 Jun-06-2019, 09:30 PM
Last Post: Drone4four
  Delete Lines that Contain Words - Loop through files in a folder - Write to new files dj99 3 339 May-18-2019, 06:34 AM
Last Post: heiner55
  o365 special subject mail download issue anna 3 219 May-16-2019, 07:16 PM
Last Post: micseydel
  Step through a really large CSV file incrementally in Python bluethundr 6 426 May-07-2019, 08:46 AM
Last Post: DeaD_EyE
  Read a folder with a multiple files NewBeie 7 327 May-06-2019, 08:04 AM
Last Post: NewBeie
  print python json dump onto multiple lines lhailey 1 275 May-05-2019, 03:24 PM
Last Post: Yoriz
  Import Large CSV File into MySQL bluethundr 2 386 Apr-28-2019, 06:35 PM
Last Post: Yoriz
  Compare two large CSV files for a match Python_Newbie9 3 397 Apr-22-2019, 08:49 PM
Last Post: ichabod801

Forum Jump:


Users browsing this thread: 1 Guest(s)