Jan-04-2021, 11:19 PM
I wrote a little script 6 months or so ago with some help from a friend. It looked at a website and got the images from it. It used to work, but stopped working a week or so ago, on any machine I have. I'm really new to all this, and had to piece together the first one I wrote before we got it cleaned up.
I don't get any error message at all. So it's hard to troubleshoot what could have changed.
Here's the website I'm trying to get images from:
https://archive.4plebs.org/hr/thread/2866456/
Here is the code I've been using. I went through lots of iterations but this was the final one I had.
Any help you can give would be awesome. Thank You!
So, before I posted this, I wanted to make sure I tested everything I knew to test. So, I played around with it a little more. and it looks like there is a security feature installed now to probably block exactly what i'm trying to do... So, is there any way around it? Or a better way to pull pictures? Here's what I'm seeing:
I don't get any error message at all. So it's hard to troubleshoot what could have changed.
Here's the website I'm trying to get images from:
https://archive.4plebs.org/hr/thread/2866456/
Here is the code I've been using. I went through lots of iterations but this was the final one I had.
########################################## ####### This is section for the main imports import requests import wget import os from bs4 import BeautifulSoup from tqdm import tqdm from urllib.parse import urljoin, urlparse from time import time from multiprocessing.pool import ThreadPool from concurrent.futures import ThreadPoolExecutor from time import sleep ########################################## ####### This is section for choosing site and save folder url = '' folder = '' url = input("Website:") folder = input("Folder:") ########################################## ####### This section I have NO idea what it does. :) Sets parser for sure r = requests.get(url, stream = True) data = r.text soup = BeautifulSoup(data, features = "lxml") ########################################## ####### This section grabs all pictures tagged download and makes folders for tag in soup.select('a.parent[download]'): dlthis = ('https:' + tag['href']) path = os.path.join(folder, tag['download']) myfile = requests.get(dlthis, allow_redirects=True, stream = True) if not os.path.isdir(folder): os.makedirs(folder) ########################################## ####### Section for Saving Files, both work # with open(path, 'wb') as f: # f.write(myfile.content) open(path, 'wb').write(myfile.content) ##########################################I have iterations that do multi-thread, and basic ones that just print out the links. But, I can't seem to get it to show anything at all. I'm sure it has something to do with the request and parse from beautifulsoup
Any help you can give would be awesome. Thank You!
So, before I posted this, I wanted to make sure I tested everything I knew to test. So, I played around with it a little more. and it looks like there is a security feature installed now to probably block exactly what i'm trying to do... So, is there any way around it? Or a better way to pull pictures? Here's what I'm seeing:
h1>Access denied</h1> <p>This website is using a security service to protect itself from online attacks.</p> <ul class="cferror_details"> <li>Ray ID: 60c8a5d2cc2b3a02</li> <li>Timestamp: 2021-01-04 23:13:01 UTC</li>Thanks