Python Forum
Tkinter Web Scraping w/Multithreading Question....
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Tkinter Web Scraping w/Multithreading Question....
#1
I have a simple program in Tkinter that I'm using to scrape the age of various people from wikipedia just for webscraping practice.

I am able to scrape the age of each person one-by-one on one thread, but I'm trying to have one thread for each person to handle scraping their ages all at the same time so that the program will be much faster.

So, in other words, the program currently scrapes only 1 person at a time and can only return 1 row at a time in the Treeview, but I'd like to have it to where a thread works for each person at the same time (concurrently) so that the Treeview will return each person's age in one shot as well.

Here's the code that I've come up with so far:

from tkinter import Tk, Button, Listbox
from tkinter import ttk
import threading
import requests
import re #imports RegEx

    class MainWindow(Tk):

    def __init__(self):
        super().__init__()

        self.option_add("*Font", "poppins, 11 bold")

        self.lb1 = Listbox(self, width=22, cursor='hand2')
        self.lb1.pack(side='left', fill='y', padx=20, pady=20)

        #create list of names
        names = ['Adam Levine', 'Arnold Schwarzenegger', 'Artur Beterbiev', 'Chris Hemsworth',
                 'Dan Henderson', 'Dustin Poirier', 'Fedor Emelianenko', 'Gennady Golovkin', 'Igor Vovchanchyn', 'Ken Shamrock',
                 'Mirko Cro Cop', 'Oleksandr Usyk', 'Ronnie Coleman', 'Vasiliy Lomachenko']

        #populate listbox with names
        for name in names:
            self.lb1.insert('end', name)


        self.tv1 = ttk.Treeview(self, show='tree headings', cursor='hand2')

        columns = ('NAME', 'AGE')

        self.tv1.config(columns = columns)

        style = ttk.Style()
        style.configure("Treeview", highlightthickness=2, bd=0, rowheight=26,font=('Poppins', 11))  # Modify the font of the body
        style.configure("Treeview.Heading", font=('Poppins', 12, 'bold'))  # Modify the font of the headings

        #configure headers
        self.tv1.column('#0', width=0, stretch=0)
        self.tv1.column('NAME', width=190)
        self.tv1.column('AGE', width=80, stretch=0)

        #define headings
        self.tv1.heading('NAME', text='NAME', anchor='w')
        self.tv1.heading('AGE', text='AGE', anchor='w')

        self.tv1.pack(fill='both', expand=1, padx=(0, 20), pady=20)

        #create start button
        self.b1 = Button(self, text='START', bg='green', fg='white', cursor='hand2', command=self.start)
        self.b1.pack(pady=(0, 20))


    #scrape data from WikiPedia.org
    def start(self):
        for item in self.tv1.get_children():
            self.tv1.delete(item)

        t1 = threading.Thread(target=self.scrape_wiki, daemon=True)
        t1.start()


    def scrape_wiki(self):
        for i in range(self.lb1.size()):
            #select the name
            self.name = self.lb1.get(i).replace(' ', '_')

            # create a simple dictionary to hold the user agent inside of the headers
            headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0'}

            # the required first parameter of the 'get' method is the 'url':
            r = requests.get('https://en.wikipedia.org/wiki/' + self.name, headers=headers)

            # regex setup

            regex = re.search('(age \d+)', r.text)
            age = regex.group(0).replace('age ', '').replace(')', '')

            # Populate Treeview with row data
            self.name = self.name.replace('_', ' ')
            self.tv1.insert(parent='', index='end', iid=i, values=(self.name, age))



if __name__ == '__main__':
    app = MainWindow()
    #app.iconbitmap('imgs/logo-icon.ico')
    app.title('Main Window')
    app.configure(bg='#333333')

    #center the Main Window:
    w = 600  # Width
    h = 520  # Height

    screen_width = app.winfo_screenwidth()  # Width of the screen
    screen_height = app.winfo_screenheight()  # Height of the screen

    # Calculate Starting X and Y coordinates for Window
    x = (screen_width / 2) - (w / 2)
    y = (screen_height / 2) - (h / 2)

    app.mainloop()
So, how exactly do I create multiple threads to handle each person in the list and return them concurrently/simultaneously instead of row-by-row, one at a time?
I'd appreciate any support.
Reply
#2
Take a look at ProcessPoolExecutor.

https://docs.python.org/3/library/concur...olExecutor
Reply
#3
Hi Dean, and thank you for your reply.

So, I've already tried the ThreadPoolExector from concurrent.futures class, but it was not compatible with updating Tkinter's GUI. Is ProcessPoolExecutor going to be any different?

I'm currently away from home, but I'll give it a shot once I return if you have tested this class & have concluded that it will indeed work with Tkinter's GUI..

Anyhow, thanks again in the meantime. 👍👍


(Dec-15-2022, 06:52 PM)deanhystad Wrote: Take a look at ProcessPoolExecutor.

https://docs.python.org/3/library/concur...olExecutor
Reply
#4
After my post I gave ProcessPoolExecuter a better look. I thought you could launch all the processes and use futures to get the results when available. Unfortunately, it looks like the executor blocks until all processes are completed, or at least started.

Have you tried Asnyc IO? I think async io works great, but it is invasive. You only want to call one function asynchronously and before you know it you have async def sprinkled everywhere.

Here's an idea that uses threads, a list of queue objects, and the tkinter after() funciton.
import tkinter as tk
from threading import Thread
from queue import Queue
from time import sleep
from random import randint
import functools


class TkinterThreadManager:
    """A thread pool manager that maintains function call order
    and doesn't block tkinter
    """
    def __init__(self, window, period=100):
        self.queue = []
        self.window = window
        self.period = period

    def submit(self, func, args=None, callback=None):
        """Add thread to the pool
        func: Function executed in thread
        args: Optional arguments passed to function.
        callback:  Optional function that is called using return value
        """
        if callback is None:
            Thread(target=func, args=args).start()
        else:
            q = Queue(maxsize=1)
            q.callback = callback
            Thread(target=func, args=(q,)+args).start()
            self.queue.append(q)
            self.join()

    def join(self):
        """Periodically check for thread completion.  Execute callback."""
        while self.queue and self.queue[0].full():
            q = self.queue.pop(0)
            q.callback(q.get())
        if self.queue:
            self.window.after(self.period, self.join)


def thread_func(func):
    """TkinterThreadManager function wrapper.  Puts function return value
    in Queue to notify manager of thread completion.
    """
    @functools.wraps(func)
    def wrapper(q, *args, **kwargs):
        value = func(*args, **kwargs)
        print(value)  # For demonstration purposes
        q.put(value)
    return wrapper


@thread_func
def get_name(name, delay):
    """Dummy function to execute via thread"""
    sleep(randint(0, delay))
    return name


class MainWindow(tk.Tk):
    """Window to demonstrate TkinterThreadManager."""
    def __init__(self):
        super().__init__()
        self.threads = TkinterThreadManager(self)
        self.lb1 = tk.Listbox(self, width=22, height=30)
        self.lb1.pack(side='left', fill='y', padx=10, pady=10)
        button = tk.Button(self, text='Letters', command=self.letters)
        button.pack(padx=10, pady=10)
        button = tk.Button(self, text='Numbers', command=self.numbers)
        button.pack(padx=10, pady=10)

    def letters(self):
        """Add some letters to the listbox"""
        for delay, name in enumerate(('A', 'B', 'C')):
            self.threads.submit(
                func=get_name,
                args=(name, delay),
                callback=lambda x: self.lb1.insert(tk.END, x))

    def numbers(self):
        """Add some numbers to the listbox"""
        for delay, name in enumerate((1, 2, 3)):
            self.threads.submit(
                func=get_name,
                args=(name, delay),
                callback=lambda x: self.lb1.insert(tk.END, x))


if __name__ == '__main__':
    MainWindow().mainloop()
AaronCatolico1 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Tkinter Simple Multithreading Question AaronCatolico1 5 1,666 Dec-14-2022, 11:35 PM
Last Post: deanhystad
  multithreading Hanyx 4 1,363 Jul-29-2022, 07:28 AM
Last Post: Larz60+
Question Problems with variables in multithreading Wombaz 2 1,364 Mar-08-2022, 03:32 PM
Last Post: Wombaz
  Multithreading question amadeok 0 1,799 Oct-17-2020, 12:54 PM
Last Post: amadeok
  How can i add multithreading in this example WoodyWoodpecker1 3 2,539 Aug-11-2020, 05:30 PM
Last Post: deanhystad
  matplotlib multithreading catosp 0 2,980 Jul-03-2020, 09:33 AM
Last Post: catosp
  Multithreading dynamically syncronism Rodrigo 0 1,557 Nov-08-2019, 02:33 AM
Last Post: Rodrigo
  Locks in Multithreading Chuonon 0 1,872 Oct-03-2019, 04:16 PM
Last Post: Chuonon
  multithreading issue with output mr_byte31 4 3,244 Sep-11-2019, 12:04 PM
Last Post: stullis
  Multithreading alternative MartinV279 1 2,816 Aug-01-2019, 11:41 PM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020