Mar-22-2017, 05:12 PM
Hello everyone,
I am trying to learn a bit about Python since I researched and saw it's a really powerful language.
So for my first "hello world" project ( well, maybe this one is a bit more advanced ) I decided to make myself a nice script that will download all the images from certain facebook page ( group ).
First, I started with this nice example of getting .csv file from facebook graph data with this awesome python code: Facebook Page Post Scraper from minimaxir ( can't link since it's my 1st post, sorry author ! )
That was my starting point. So from this I wanted to extract all the photo links and download images, simple :)
This is the code I wrote and it works ! It works but for sure this has so many holes and things could improve.
I am trying to learn a bit about Python since I researched and saw it's a really powerful language.
So for my first "hello world" project ( well, maybe this one is a bit more advanced ) I decided to make myself a nice script that will download all the images from certain facebook page ( group ).
First, I started with this nice example of getting .csv file from facebook graph data with this awesome python code: Facebook Page Post Scraper from minimaxir ( can't link since it's my 1st post, sorry author ! )
That was my starting point. So from this I wanted to extract all the photo links and download images, simple :)
This is the code I wrote and it works ! It works but for sure this has so many holes and things could improve.
import csv import urllib.request from collections import defaultdict columns = defaultdict(list) # each value in each column is appended to a list with open('disu.txt') as f: reader = csv.DictReader(f) # read rows into a dictionary format for row in reader: # read a row as {column1: value1, column2: value2,...} for (k,v) in row.items(): # go over each column name and value columns[k].append(v) # append the value into the appropriate list based on column name k strings = columns['status_link'] # take only status_link column out of list # loop and remove any of '' empty values inside list if any exists. This is to avoid errors in further loop if there is no url to download. while True: try: strings.remove("") except ValueError: break # split list value ( it's always the same URL ), and take only img ID from url strings = [i.rsplit('/', 2)[-2] for i in strings] # fint length of list, so we can loop trough all values sumtotal = len(strings) count = 0 #while (count < sumtotal-1): this is "while" for automatic loop based on list length, now we just take few elements to test while (count < 20): count = count + 1 one = strings[count] one = one.rsplit('/', 1)[-1] newurl = ('https://graph.facebook.com/'+ one +'/picture') urllib.request.urlretrieve(newurl, 'slike/test'+ str(count) +'.jpg') print(newurl, "Downloaded")I am aware this is bad way of downloading images, since:
- I don't use API that facebook provides. I used csv file from another script mentioned above
- I found way that using
('https://graph.facebook.com/'+ one +'/picture')
redirects to real image and it works ! But probably is a bad way to do it
- I don't have any checkups for 404 or 500 errors, so if that happens, my scripts stops.
- Also mind that I just started to learn programming and python, so loops above may be soooo wrong, but that's why I am here