Python Forum
Web Scraping via Web App (Asking for Tips)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Scraping via Web App (Asking for Tips)
#1
Question 
Hello, fellow programmers,

I am learning Python through working on personal projects and right now, I am trying to create web application product comparator. I started working on it, but I stopped for a while, because there are some things I do not understand as much as I would like, so I am humbly asking few questions at the end of the post. The goal of project is to take user input, quickly send requests to predeterminated e-shops, scrape through html or json and print the possible findings to the user. So I will introduce the basics of the program.

Modules currently used:
  • For sending multiple requests at once:
    • asyncio
    • aiohttp
  • For scraping:
    • bs4
    • json
Framework planned:
  • flask or django

Project is made of two files:
app.py
import asyncio
import json

import test_module


def main():
    user_input = input("Product to search for:")

    find_products = asyncio.run(test_module.MyTestClass.parse_eshops(user_input))

    for results in find_products.items():
        for value in results:
            print(json.dumps(value, indent=4, sort_keys=False, ensure_ascii=False))
        print("")

if __name__ == "__main__":
    main()
test_module.py
import asyncio
import bs4
import json

import aiohttp


URL_SHOP_1 = "https://www.eshop_1.com/search/"
URL_SHOP_2 = "https://www.eshop_2.com/search/"
...

items_shop_1 = {}
items_shop_2 = {}
...

items_found = {}


class MyTestClass:

    @staticmethod
    async def parse_shops(user_input):
        await asyncio.gather(
            MyTestClass.parse_shop_1(user_input),
            MyTestClass.parse_shop_2(user_input),
            ...
        )

        return items_found

    async def parse_shop_1(user_input):
        payload = {"q": user_input.replace(" ", "-")}
        soup = await MyTestClass.send_request(URL_1, payload)
        ...
        items_shop_1["price"] = soup.find(class_="price").text
        ...        
        items_found["shop_1"] = items_shop_1

    async def send_request(url, payload=None):
        ...
The code I wrote is working exactly as intented, yet I found myself wondering, if there are any basic/fundamental problems or things to do better.

So my questions are:
  1. Was it good idea to use listed modules? are there any other better modules for sending multiple requests at one time?
  2. Was it right to put the variables outside of a class or should they be inside class or in separate file?
  3. Is it ok to use at the core of everything class and one static method or would it be better to just use class and instance methods or not use class at all and build everything around functions? basically I do not know, when exactly use static/instance methods or functions.
  4. Which framework would be better for this project? flask or django or any others?

But if anyone wants to share their opinion about other things, please, do so. Yet in case of blatant criticism, be so kind and explain what is your reasoning.

Thanks everyone who read through all of this and thanks even more to those who are willing to participate!


Sincerely
Martin
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020