Python Forum
Learning Python, need suggestions
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Learning Python, need suggestions
#1
Hello everyone, 

I am trying to learn a bit about Python since I researched and saw it's a really powerful language.

So for my first "hello world" project ( well, maybe this one is a bit more advanced ) I decided to make myself a nice script that will download all the images from certain facebook page ( group ). 

First, I started with this nice example of getting .csv file from facebook graph data with this awesome python code: Facebook Page Post Scraper from minimaxir ( can't link since it's my 1st post, sorry author ! )

That was my starting point. So from this I wanted to extract all the photo links and download images, simple :)

This is the code I wrote and it works ! It works but for sure this has so many holes and things could improve. 

import csv
import urllib.request
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('disu.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list based on column name k

strings = columns['status_link'] # take only status_link column out of list

# loop and remove any of '' empty values inside list if any exists. This is to avoid errors in further loop if there is no url to download.
while True:
  try:
    strings.remove("")
  except ValueError:
    break

# split list value ( it's always the same URL ), and take only img ID from url 
strings = [i.rsplit('/', 2)[-2] for i in strings]

# fint length of list, so we can loop trough all values
sumtotal = len(strings)
count = 0

#while (count < sumtotal-1):    this is "while" for automatic loop based on list length, now we just take few elements to test

while (count < 20):
count = count + 1
one = strings[count]
one = one.rsplit('/', 1)[-1]
newurl = ('https://graph.facebook.com/'+ one +'/picture')
urllib.request.urlretrieve(newurl, 'slike/test'+ str(count) +'.jpg')
print(newurl, "Downloaded")
I am aware this is bad way of downloading images, since:

  1. I don't use API that facebook provides. I used csv file from another script mentioned above 
  2. I found way that using  ('https://graph.facebook.com/'+ one +'/picture') redirects to real image and it works ! But probably is a bad way to do it
  3. I don't have any checkups for 404 or 500 errors, so if that happens, my scripts stops.
  4. Also mind that I just started to learn programming and python, so loops above may be soooo wrong, but that's why I am here
What I want from You is to help me with code fix suggestions if that's possible. ( btw, spaces in 2nd while loop are messed in this code output, I tried to fix but idk, BUT code works )
Reply
#2
You should use the facebook api.

The while loop can be replaced with for loop:
# enumerate returns sequence number for each iteration
# string[:20] gets the first 20 elements. This is called slicing*
for count, one in enumerate(strings[:20]):
    one = one.rsplit('/', 1)[-1]
    newurl = ('https://graph.facebook.com/'+ one +'/picture')

    try:
        urllib.request.urlretrieve(newurl, 'slike/test{}.jpg'.format(str(count)))
        print(newurl, "Downloaded")
    except Exception as e:
        print(type(e))
        print(inst.e)
        print(e)
* slicing
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Logic suggestions for comparing 2 csv's cubangt 7 1,128 Nov-09-2023, 09:54 PM
Last Post: cubangt
  Require Some Suggestions gouravlal 2 1,878 Jul-27-2020, 06:14 AM
Last Post: gouravlal
  Python Debugger Suggestions nilamo 3 3,051 Oct-22-2018, 07:05 PM
Last Post: jdjeffers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020