Python Forum

G'day All.

I'm working on a .py that will scrape tweets from twitter and write them to a csv file: Full disclosure, i got the majority of the script from (nocodewebscraping)
However I have modified some of the code to drill down a bit further and get some specific data that i'm chasing.

The part that I am stuck on is getting the .py to loop through an xlsx file with a list of user names, as i need to scrape approx 400 names a week.

Effectively, what I need/want it to do is this

load.work_book('ClientList') - Use data from Cell A1, (User1) in search_name query then post (User1) as Title in (ScrapingFile) Cell A1, then post last 20 tweets in cells A2:A21 underneath Title, (then)
return to Client List Cell A2, (User2) in search_name query then post (User2) as Title in (ScrapingFile) Cell A22, then post last 20 tweets in cells A23:A44 underneath Title - rinse/repeat.

I will freely admit, I've spent the better part of 2 days scratching my noggin, and have searched git/stack/reddit/google/here and have not had much luck, perhaps it is my wording or misunderstanding of something basic, but i would love some assistance.

I am using Atom.Io, Windows 7, and CSV, Tweepy & Openpyxl

Thanks in advance!

import tweepy 
import csv
import openpyxl

    consumer_key = Secret
    consumer_secret = Secret
    access_key = Secret
    access_secret = Secret

# This is where i "think" i use the "FOR" loop, but freely admit to being lost
    def get_all_tweets(screen_name):

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    #Limit tweets to last 20 *modifiable*
    for tweet in tweepy.Cursor(api.search, q="google", rpp=1, count=20, result_type="recent", include_entities=True, lang="en") .items(20):


    alltweets = []

#First instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)


    alltweets.extend(new_tweets)


    oldest = alltweets[-1].id - 1


    while len(new_tweets) > 0:
    print "getting tweets before %s" % (oldest)

#Second instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11
    new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

    alltweets.extend(new_tweets)


    oldest = alltweets[-1].id - 1

    print "...%s tweets downloaded so far" % (len(alltweets))

# Hashtag modifiers to drill further into tweets content
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")]for tweet in alltweets if '#firsttag' if 'secondtag' in tweet.text.encode("utf-8")]

#I want to create a CSV file, that will dump all records into ScrapingFile.csv once it has looped all names in ClientList.xlsx
#format will be name from Clientlist CellA1, then 20 tweets then name from ClientList Cell A2 (rinse repeat)
    with open('%s_tweets.csv' % ScrapingFile, 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(["id","created_at","text","retweet_count","favorite_count"])
    writer.writerows(outtweets)

    pass


    if __name__ == '__main__':

#final instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11
    get_all_tweets("screen_name")

http://www.pythonexcel.com/openpyxl.php

mcarthur2086

Larz60+