Python Forum
Need help Multiprocessing with BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help Multiprocessing with BeautifulSoup
#4
Figured out a solution. For anyone interested:
import math
from multiprocessing import Pool
from multiprocessing import cpu_count
# results_list = list of requests.get(<url>).content items (list of html text in byte type)
chunky_monkey = math.ceil(len(results_list)/cpu_count())
#chunky_monkey is a variable (that works on any pc and list size) used to evenly distribute chunks of equal size to cpu cores I came up with that myself :D


def parseru(requested):
    soup = bs.BeautifulSoup(requested, 'lxml', parse_only=parse_only1)
    tr_list = soup.find_all('tr')
    tr_list = (tr_list[3:])[:10]
    for tr in tr_list:
        if date_string in tr.text:
            if '8-K' in tr.text:
                if requested not in notified_list:
                    return requested


if __name__ == '__main__':
    pool = Pool(cpu_count())
    fat_list = pool.map(func=parseru, iterable=results_list, chunksize=chunky_monkey)
    pool.close()
    pool.join()
    send_list = [x for x in fat_list if x is not None] # I couldn't figure out how to use global variables for multi-processes, so I just delete every returned value that's None
Reply


Messages In This Thread
RE: Need help Multiprocessing with BeautifulSoup - by HiImNew - Jun-07-2018, 05:31 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020