Python Forum
Image Scraper (beautifulsoup), stopped working, need to help see why
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Image Scraper (beautifulsoup), stopped working, need to help see why
#8
Holy crap dude. haha. seriously, holy crap. hahaha. It's beautiful. You must just laugh at how easy that is. You're good man. Mind Blown.

Since I got the other one to workish, I started looking at fixing the mutli-threaded one of the same model. I wasn't as much help in this one. Take a look?

Original code:

##########################################
#######    This is section for the main imports

import requests
import os
from bs4 import BeautifulSoup
from tqdm import tqdm
from multiprocessing.pool import ThreadPool

def save_image(tag):
    dlthis = ('https:' + tag['href'])
    print(dlthis)
    path = os.path.join(folder, tag['download'])
    myfile = requests.get(dlthis, allow_redirects=True, stream = True)
    ##########################################
    #######    Section for Saving Files, both work  
    #    with open(path, 'wb') as f:
    #        f.write(myfile.content)
    open(path, 'wb').write(myfile.content)
    ##########################################


if __name__ == '__main__':
    ##########################################
    #######    This is section for choosing site and save folder
    url = ''
    folder = ''

    url = input("Website:")
    folder = input("Folder:")

    if not os.path.isdir(folder):
        os.makedirs(folder)

    ##########################################
    #######    This section I have NO idea what it does.  :)  Sets parser for sure
    r  = requests.get(url, stream = True)
    data = r.text
    soup = BeautifulSoup(data, features = "lxml")

    ##########################################
    #######    This section grabs all pictures tagged download and makes folders

    images = soup.select('a.parent[download]')
    ThreadPool().map(save_image, images)



And my bastardized way of trying to get your fix to work on it.


##########################################
#######    This is section for the main imports

import requests
import os
from bs4 import BeautifulSoup
from tqdm import tqdm
from multiprocessing.pool import ThreadPool

def save_image(tag):
    dlthis = (img.get('href'))
    strnum = str(number)
    newnum = " " + strnum
    namestr = name + newnum + ".jpg"
    path = os.path.join(folder, namestr)
    myfile=requests.get(dlthis, allow_redirects=True, stream = True)
    ##########################################
    #######    Section for Saving Files, both work  
    #    with open(path, 'wb') as f:
    #        f.write(myfile.content)
    open(path, 'wb').write(myfile.content)
    ##########################################


if __name__ == '__main__':
    ##########################################
    #######    This is section for choosing site and save folder
    url = ''
    folder = ''
    name = ''
    number = 1
    
    url = input("Website:")
    folder = input("Folder:")
    name = input("Name:")
    
    
    
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}

    if not os.path.isdir(folder):
        os.makedirs(folder)

    ##########################################
    #######    This section I have NO idea what it does.  :)  Sets parser for sure
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'lxml')

    ##########################################
    #######    This section grabs all pictures tagged download and makes folders

    images = soup.select('div.thread_image_box > a')
    ThreadPool().map(save_image, images)
i think there is an issue with the ".map" and then the "img.get" in the function.
Reply


Messages In This Thread
RE: Image Scraper (beautifulsoup), stopped working, need to help see why - by woodmister - Jan-05-2021, 04:46 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scraper tomenzo123 8 4,610 Aug-18-2023, 12:45 PM
Last Post: Gaurav_Kumar
  Web scraper not populating .txt with scraped data BlackHeart 5 1,643 Apr-03-2023, 05:12 PM
Last Post: snippsat
  BeautifulSoup Showing none while extracting image url josephandrew 0 1,994 Sep-20-2021, 11:40 AM
Last Post: josephandrew
  Web scrapping - Stopped working peterjv26 2 3,193 Sep-23-2020, 08:30 AM
Last Post: peterjv26
  not getting image src in my BeautifulSoup csv file farhan275 11 3,932 Sep-14-2020, 04:52 PM
Last Post: buran
  Court Opinion Scraper in Python w/ BS4 (Currently exports to CSV) need help with SQL MidnightDreamer 4 3,130 Mar-12-2020, 09:57 AM
Last Post: BrandonKastning
  Python using BS scraper paulfearn100 1 2,635 Feb-07-2020, 10:22 PM
Last Post: snippsat
  web scraper using pathlib Larz60+ 1 3,276 Oct-16-2017, 05:27 PM
Last Post: Larz60+
  Need alittle hlpl with an image scraper. Blue Dog 8 7,880 Dec-24-2016, 08:09 PM
Last Post: Blue Dog
  Made a very simple email grabber(scraper) Blue Dog 4 7,015 Dec-13-2016, 06:25 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020