Image Scraper (beautifulsoup), stopped working, need to help see why

woodmister · Jan-05-2021, 04:46 PM

Holy crap dude. haha. seriously, holy crap. hahaha. It's beautiful. You must just laugh at how easy that is. You're good man. Mind Blown.

Since I got the other one to workish, I started looking at fixing the mutli-threaded one of the same model. I wasn't as much help in this one. Take a look?

Original code:

##########################################
#######    This is section for the main imports

import requests
import os
from bs4 import BeautifulSoup
from tqdm import tqdm
from multiprocessing.pool import ThreadPool

def save_image(tag):
    dlthis = ('https:' + tag['href'])
    print(dlthis)
    path = os.path.join(folder, tag['download'])
    myfile = requests.get(dlthis, allow_redirects=True, stream = True)
    ##########################################
    #######    Section for Saving Files, both work  
    #    with open(path, 'wb') as f:
    #        f.write(myfile.content)
    open(path, 'wb').write(myfile.content)
    ##########################################


if __name__ == '__main__':
    ##########################################
    #######    This is section for choosing site and save folder
    url = ''
    folder = ''

    url = input("Website:")
    folder = input("Folder:")

    if not os.path.isdir(folder):
        os.makedirs(folder)

    ##########################################
    #######    This section I have NO idea what it does.  :)  Sets parser for sure
    r  = requests.get(url, stream = True)
    data = r.text
    soup = BeautifulSoup(data, features = "lxml")

    ##########################################
    #######    This section grabs all pictures tagged download and makes folders

    images = soup.select('a.parent[download]')
    ThreadPool().map(save_image, images)

And my bastardized way of trying to get your fix to work on it.

##########################################
#######    This is section for the main imports

import requests
import os
from bs4 import BeautifulSoup
from tqdm import tqdm
from multiprocessing.pool import ThreadPool

def save_image(tag):
    dlthis = (img.get('href'))
    strnum = str(number)
    newnum = " " + strnum
    namestr = name + newnum + ".jpg"
    path = os.path.join(folder, namestr)
    myfile=requests.get(dlthis, allow_redirects=True, stream = True)
    ##########################################
    #######    Section for Saving Files, both work  
    #    with open(path, 'wb') as f:
    #        f.write(myfile.content)
    open(path, 'wb').write(myfile.content)
    ##########################################


if __name__ == '__main__':
    ##########################################
    #######    This is section for choosing site and save folder
    url = ''
    folder = ''
    name = ''
    number = 1
    
    url = input("Website:")
    folder = input("Folder:")
    name = input("Name:")
    
    
    
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}

    if not os.path.isdir(folder):
        os.makedirs(folder)

    ##########################################
    #######    This section I have NO idea what it does.  :)  Sets parser for sure
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'lxml')

    ##########################################
    #######    This section grabs all pictures tagged download and makes folders

    images = soup.select('div.thread_image_box > a')
    ThreadPool().map(save_image, images)

i think there is an issue with the ".map" and then the "img.get" in the function.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Web scraper	tomenzo123	8	4,610	Aug-18-2023, 12:45 PM Last Post: Gaurav_Kumar
	Web scraper not populating .txt with scraped data	BlackHeart	5	1,643	Apr-03-2023, 05:12 PM Last Post: snippsat
	BeautifulSoup Showing none while extracting image url	josephandrew	0	1,994	Sep-20-2021, 11:40 AM Last Post: josephandrew
	Web scrapping - Stopped working	peterjv26	2	3,193	Sep-23-2020, 08:30 AM Last Post: peterjv26
	not getting image src in my BeautifulSoup csv file	farhan275	11	3,932	Sep-14-2020, 04:52 PM Last Post: buran
	Court Opinion Scraper in Python w/ BS4 (Currently exports to CSV) need help with SQL	MidnightDreamer	4	3,130	Mar-12-2020, 09:57 AM Last Post: BrandonKastning
	Python using BS scraper	paulfearn100	1	2,635	Feb-07-2020, 10:22 PM Last Post: snippsat
	web scraper using pathlib	Larz60+	1	3,276	Oct-16-2017, 05:27 PM Last Post: Larz60+
	Need alittle hlpl with an image scraper.	Blue Dog	8	7,880	Dec-24-2016, 08:09 PM Last Post: Blue Dog
	Made a very simple email grabber(scraper)	Blue Dog	4	7,015	Dec-13-2016, 06:25 AM Last Post: wavic

Image Scraper (beautifulsoup), stopped working, need to help see why

User Panel Messages

Announcements