![]() |
Loop a list of usernames from a xlsx file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Loop a list of usernames from a xlsx file (/thread-1289.html) |
Loop a list of usernames from a xlsx file - mcarthur2086 - Dec-20-2016 G'day All. I'm working on a .py that will scrape tweets from twitter and write them to a csv file: Full disclosure, i got the majority of the script from (nocodewebscraping) However I have modified some of the code to drill down a bit further and get some specific data that i'm chasing. The part that I am stuck on is getting the .py to loop through an xlsx file with a list of user names, as i need to scrape approx 400 names a week. Effectively, what I need/want it to do is this load.work_book('ClientList') - Use data from Cell A1, (User1) in search_name query then post (User1) as Title in (ScrapingFile) Cell A1, then post last 20 tweets in cells A2:A21 underneath Title, (then) return to Client List Cell A2, (User2) in search_name query then post (User2) as Title in (ScrapingFile) Cell A22, then post last 20 tweets in cells A23:A44 underneath Title - rinse/repeat. I will freely admit, I've spent the better part of 2 days scratching my noggin, and have searched git/stack/reddit/google/here and have not had much luck, perhaps it is my wording or misunderstanding of something basic, but i would love some assistance. I am using Atom.Io, Windows 7, and CSV, Tweepy & Openpyxl Thanks in advance! import tweepy import csv import openpyxl consumer_key = Secret consumer_secret = Secret access_key = Secret access_secret = Secret # This is where i "think" i use the "FOR" loop, but freely admit to being lost def get_all_tweets(screen_name): auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth) #Limit tweets to last 20 *modifiable* for tweet in tweepy.Cursor(api.search, q="google", rpp=1, count=20, result_type="recent", include_entities=True, lang="en") .items(20): alltweets = [] #First instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11 new_tweets = api.user_timeline(screen_name = screen_name,count=200) alltweets.extend(new_tweets) oldest = alltweets[-1].id - 1 while len(new_tweets) > 0: print "getting tweets before %s" % (oldest) #Second instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11 new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest) alltweets.extend(new_tweets) oldest = alltweets[-1].id - 1 print "...%s tweets downloaded so far" % (len(alltweets)) # Hashtag modifiers to drill further into tweets content outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")]for tweet in alltweets if '#firsttag' if 'secondtag' in tweet.text.encode("utf-8")] #I want to create a CSV file, that will dump all records into ScrapingFile.csv once it has looped all names in ClientList.xlsx #format will be name from Clientlist CellA1, then 20 tweets then name from ClientList Cell A2 (rinse repeat) with open('%s_tweets.csv' % ScrapingFile, 'wb') as f: writer = csv.writer(f) writer.writerow(["id","created_at","text","retweet_count","favorite_count"]) writer.writerows(outtweets) pass if __name__ == '__main__': #final instance of screen_name being repeated, however i assume i simply change screen_name= to whatever the for loop is up on line 11 get_all_tweets("screen_name") RE: Loop a list of usernames from a xlsx file - Larz60+ - Dec-21-2016 http://www.pythonexcel.com/openpyxl.php |