Dec-20-2016, 11:32 PM
G'day All.
I'm working on a .py that will scrape tweets from twitter and write them to a csv file: Full disclosure, i got the majority of the script from (nocodewebscraping)
However I have modified some of the code to drill down a bit further and get some specific data that i'm chasing.
The part that I am stuck on is getting the .py to loop through an xlsx file with a list of user names, as i need to scrape approx 400 names a week.
Effectively, what I need/want it to do is this
load.work_book('ClientList') - Use data from Cell A1, (User1) in search_name query then post (User1) as Title in (ScrapingFile) Cell A1, then post last 20 tweets in cells A2:A21 underneath Title, (then)
return to Client List Cell A2, (User2) in search_name query then post (User2) as Title in (ScrapingFile) Cell A22, then post last 20 tweets in cells A23:A44 underneath Title - rinse/repeat.
I will freely admit, I've spent the better part of 2 days scratching my noggin, and have searched git/stack/reddit/google/here and have not had much luck, perhaps it is my wording or misunderstanding of something basic, but i would love some assistance.
I am using Atom.Io, Windows 7, and CSV, Tweepy & Openpyxl
Thanks in advance!
I'm working on a .py that will scrape tweets from twitter and write them to a csv file: Full disclosure, i got the majority of the script from (nocodewebscraping)
However I have modified some of the code to drill down a bit further and get some specific data that i'm chasing.
The part that I am stuck on is getting the .py to loop through an xlsx file with a list of user names, as i need to scrape approx 400 names a week.
Effectively, what I need/want it to do is this
load.work_book('ClientList') - Use data from Cell A1, (User1) in search_name query then post (User1) as Title in (ScrapingFile) Cell A1, then post last 20 tweets in cells A2:A21 underneath Title, (then)
return to Client List Cell A2, (User2) in search_name query then post (User2) as Title in (ScrapingFile) Cell A22, then post last 20 tweets in cells A23:A44 underneath Title - rinse/repeat.
I will freely admit, I've spent the better part of 2 days scratching my noggin, and have searched git/stack/reddit/google/here and have not had much luck, perhaps it is my wording or misunderstanding of something basic, but i would love some assistance.
I am using Atom.Io, Windows 7, and CSV, Tweepy & Openpyxl
Thanks in advance!
import tweepy import csv import openpyxl consumer_key = Secret consumer_secret = Secret access_key = Secret access_secret = Secret # This is where i "think" i use the "FOR" loop, but freely admit to being lost def get_all_tweets(screen_name): auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth) #Limit tweets to last 20 *modifiable* for tweet in tweepy.Cursor(api.search, q="google", rpp=1, count=20, result_type="recent", include_entities=True, lang="en") .items(20): alltweets = [] #First instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11 new_tweets = api.user_timeline(screen_name = screen_name,count=200) alltweets.extend(new_tweets) oldest = alltweets[-1].id - 1 while len(new_tweets) > 0: print "getting tweets before %s" % (oldest) #Second instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11 new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest) alltweets.extend(new_tweets) oldest = alltweets[-1].id - 1 print "...%s tweets downloaded so far" % (len(alltweets)) # Hashtag modifiers to drill further into tweets content outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")]for tweet in alltweets if '#firsttag' if 'secondtag' in tweet.text.encode("utf-8")] #I want to create a CSV file, that will dump all records into ScrapingFile.csv once it has looped all names in ClientList.xlsx #format will be name from Clientlist CellA1, then 20 tweets then name from ClientList Cell A2 (rinse repeat) with open('%s_tweets.csv' % ScrapingFile, 'wb') as f: writer = csv.writer(f) writer.writerow(["id","created_at","text","retweet_count","favorite_count"]) writer.writerows(outtweets) pass if __name__ == '__main__': #final instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11 get_all_tweets("screen_name")