urlib - to use or not to use ( for web scraping )?

urlib - to use or not to use ( for web scraping )?

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

urlib - to use or not to use ( for web scraping )?

Truman
Minister of Silly Walks

Posts: 404

Threads: 94

Joined: Dec 2017

Reputation: 1

#46

Dec-19-2018, 10:45 PM

import os
import requests
from bs4 import BeautifulSoup

downloadDirectory = "downloaded"
baseUrl = "http://pythonscraping.com"

def getAbsoluteURL(baseUrl, source):
    if source.startswith("http://www."):
        url = "http://"+source[11:]
    elif source.startswith("http://"):
        url = source
    elif source.startswith("www."):
        url = source[4:]
        url = "http://"+source
    else:
        url = baseUrl+"/"+source
    if baseUrl not in url:
        return None
    return url 
	
def getDownloadPath(baseUrl, absoluteUrl, downloadDirectory):
    path = absoluteUrl.replace("www.", "")
    path = path.replace(baseUrl, "")
    path = downloadDirectory+path
    directory = os.path.dirname(path)
    if not os.path.exists(directory):
        os.makedirs(directory)
    return path


html = requests.get("http://www.pythonscraping.com")
bsObj = BeautifulSoup(html.content, 'html.parser')
downloadList = bsObj.find_all('img')

for download in downloadList:
    fileUrl = getAbsoluteURL(baseUrl,download["src"])
    if fileUrl is not None:
        print(fileUrl)
    r = requests.get(fileUrl, allow_redirects=True)
    filename = fileUrl.split('/')[-1]
    with open(filename, 'wb') as out_file:
        out_file.write(r.content)

I made some correction in last 10 lines but the problem is now that it completely ommits folder 'downloaded' and getDownloadPath function.

Find

Messages In This Thread

urlib - to use or not to use ( for web scraping )? - by Truman - Sep-26-2018, 11:48 PM

RE: urlib - to use or not to use ( for web scraping )? - by metulburr - Sep-27-2018, 01:20 AM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Sep-27-2018, 01:21 AM

RE: urlib - to use or not to use ( for web scraping )? - by Axel_Erfurt - Sep-27-2018, 07:07 AM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Sep-27-2018, 11:45 AM

RE: urlib - to use or not to use ( for web scraping )? - by metulburr - Sep-27-2018, 01:10 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Sep-27-2018, 10:23 PM

RE: urlib - to use or not to use ( for web scraping )? - by wavic - Sep-30-2018, 09:56 AM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Sep-30-2018, 11:29 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Sep-30-2018, 09:16 PM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Sep-30-2018, 11:03 PM

RE: urlib - to use or not to use ( for web scraping )? - by metulburr - Sep-30-2018, 11:06 PM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Sep-30-2018, 11:28 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Sep-30-2018, 11:33 PM

RE: urlib - to use or not to use ( for web scraping )? - by metulburr - Sep-30-2018, 11:41 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Sep-30-2018, 11:43 PM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Sep-30-2018, 11:57 PM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Oct-01-2018, 12:27 AM

RE: urlib - to use or not to use ( for web scraping )? - by snippsat - Oct-01-2018, 05:06 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Nov-11-2018, 01:01 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Nov-27-2018, 12:07 AM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Nov-27-2018, 01:29 AM

RE: urlib - to use or not to use ( for web scraping )? - by stranac - Nov-27-2018, 05:16 AM

RE: urlib - to use or not to use ( for web scraping )? - by snippsat - Nov-27-2018, 10:59 AM

RE: urlib - to use or not to use ( for web scraping )? - by stranac - Nov-27-2018, 03:13 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Nov-27-2018, 10:45 PM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Nov-27-2018, 10:49 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Nov-27-2018, 11:28 PM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Nov-28-2018, 12:29 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Nov-28-2018, 10:25 PM

RE: urlib - to use or not to use ( for web scraping )? - by wavic - Nov-29-2018, 12:29 AM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Nov-28-2018, 11:15 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Nov-29-2018, 11:10 PM

RE: urlib - to use or not to use ( for web scraping )? - by wavic - Nov-30-2018, 08:57 AM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Nov-29-2018, 11:15 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Dec-10-2018, 11:15 PM

RE: urlib - to use or not to use ( for web scraping )? - by snippsat - Dec-10-2018, 11:51 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Dec-11-2018, 11:49 PM

RE: urlib - to use or not to use ( for web scraping )? - by Larz60+ - Dec-12-2018, 12:44 AM

RE: urlib - to use or not to use ( for web scraping )? - by snippsat - Dec-12-2018, 01:37 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Dec-12-2018, 11:09 PM

RE: urlib - to use or not to use ( for web scraping )? - by snippsat - Dec-13-2018, 03:54 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Dec-14-2018, 12:25 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Dec-15-2018, 12:34 AM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Dec-17-2018, 11:24 PM

RE: urlib - to use or not to use ( for web scraping )? - by Truman - Dec-19-2018, 10:45 PM

Users browsing this thread: 1 Guest(s)

View a Printable Version

User Panel Messages

Pay your profile a visit

User Control Panel

Do some changes on your profile

View private messages unread

Change signature

Announcements

Announcement #1 8/1/2020

Announcement #2 8/2/2020

Announcement #3 8/6/2020