Python Forum

Full Version: pytrends problem
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I've referred to the pytrends, but I still can't understand that the system always shows invalid syntax in the 44th line. Can anyone help me? Thanks a million.
# -*- coding: utf-8 -*-
import time
import codecs
import random
import glob  
from pytrends.request import TrendReq

google_username = "[email protected]"
google_password = "XXXXXXX"

f = open('stocks_tw.txt', 'r')
stocks_no_name = []
for line in f.readlines():
    data = line.split('\t')
    stock_no = data[0].strip()
    stock_name = data[1].strip()
    stocks_no_name.append([stock_no, stock_name])
f.close()


files=glob.glob('*.csv')  
downloaded_files = [fd.title().lower()[0:4] for fd in files]

stocks_no_name_new = [] 
for stock_no_name in stocks_no_name:
    if not stock_no_name[0] in downloaded_files:
        stocks_no_name_new.append(stock_no_name)
stocks_no_name = stocks_no_name_new       
    
print(len(stocks_no_name))

# connect to Google
pytrends = TrendReq(google_username, google_password, custom_useragent='My Pytrends Script')

while stocks_no_name:
    stock_index = random.randint(0,len(stocks_no_name)-1)
    stock_no_name = stocks_no_name[stock_index]
    stock_no = stock_no_name[0]
    stock_name = stock_no_name[1]
    print(stock_no, stock_name)

    try:
        one_stock_data = []    
        trend_payload = {'q': stock_name, 'date': ''2013-12-29 2016-12-31', 'geo': 'TW','tz': 'Etc/GMT+8'}
        # trend
        trend = pytrend.trend(trend_payload)
        time.sleep(random.randint(120, 360))
    
        table = trend['table']
        rows = table['rows']
        for i in range(len(rows)):
            row_data = []
            for j in range(len(rows[0]['c'])):
                row_data.append(rows[i]['c'][j]['v'])
            one_stock_data.append(row_data) 
                
        # output one_stock_data to a file
        filename = unicode(stock_no, errors='ignore') + '.csv'
        outfile = codecs.open(filename, "wb", "utf-8")
        for i in range(len(one_stock_data)):
            one_stock_data_str =  str(one_stock_data[i][0]) + ", " + str(one_stock_data[i][1])
            if i != len(one_stock_data) - 1:
                one_stock_data_str =  one_stock_data_str + "\r\n"
            outfile.write(one_stock_data_str)
        outfile.close()
        
        stocks_no_name.pop(stock_index)
    
    except:
        time.sleep(random.randint(120, 360))
        continue
Moderator zivoni: removed login informations
There are two single quotes on start of 2013-.. string ''2013-12-29 2016-12-31', should be only one.
After handling the line, I have another problem.
Below is the IPython console shows. I use the anaconda python. It runs, but no file downloads, why?

debugfile('C:/Users/user/Desktop/論文/資料來源/download_trend_past3years.py', wdir='C:/Users/user/Desktop/論文/資料來源')
Traceback (most recent call last):

File "<ipython-input-1-3457bb3b7f77>", line 1, in <module>
debugfile('C:/Users/user/Desktop/論文/資料來源/download_trend_past3years.py', wdir='C:/Users/user/Desktop/論文/資料來源')

File "C:\Users\user\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 888, in debugfile
debugger.run("runfile(%r, args=%r, wdir=%r)" % (filename, args, wdir))

File "C:\Users\user\Anaconda2\lib\bdb.py", line 400, in run
exec cmd in globals, locals

File "<string>", line 1, in <module>

File "C:\Users\user\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)

File "C:\Users\user\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 80, in execfile
scripttext = builtins.open(fname).read()+ '\n'

IOError: [Errno 22] invalid mode ('r') or filename: 'c:/users/user/desktop/\xe8\xab\x96\xe6?/\xe8\xb3\x87\xe6?\xe4\xbe\x86\xe6?/download_trend_past3years.py'


runfile('C:/Users/user/Desktop/論文/資料來源/download_trend_past3years.py', wdir='C:/Users/user/Desktop/論文/資料來源')
849
('1718', '\xa4\xa4\xc5\xd6')
('5608', '\xa5|\xba\xfb\xaf\xe8')
('2395', '\xac\xe3\xb5\xd8')
('2103', '\xa5x\xbe\xf3')

('2430', '\xc0\xe9\xa9[\xb9\xea\xb7~')
That's not related to your problem, but perhaps you want to edit your first post and remove your password from the script. And change it! :-)
(Apr-12-2017, 08:36 PM)buran Wrote: [ -> ]That's not related to your problem, but perhaps you want to edit your first post and remove your password from the script. And change it! :-)

Wow   LOL I'm coming, I'm coming
From that error output it seems it could be a  spyder problem, maybe spyder has problem to run file with Chinese characters in path? Try to run your file directly from command prompt from your working directory with python download_trend_past3years.py
Can anyone tell me how to use it? I haven't used .py in command prompt before.

I've found the solution to the command prompt. But it still can't work.
what is the meaning in the module?

files=glob.glob('*.csv')
downloaded_files = [fd.title().lower()[0:4] for fd in files]
glob.glob(pattern) returns list of paths matching given pattern. So if you have in working directory files
Output:
second.csv  test_file.csv  Tracker.csv  test.txt  boo.doc
then files will be list containing names of csv files.
Output:
>>> files = glob.glob("*.csv") >>> files ['Tracker.csv', 'second.csv', 'test_file.csv']
downloaded_files = [fd.title().lower()[0:4] for fd in files] 
Right side is a list comprehension - this one "takes" names from files list, for each name it uppercases first characters for words in name, then lowercase all, then truncates it to first four characters.
Output:
>>> downloaded_files = [fd.title().lower()[0:4] for fd in files] >>> downloaded_files ['trac', 'seco', 'test']
That .title() part is totally obsolete (due to following .lower()) and  zero can be omitted too.
Before the 42nd line, the code seems to be workable. But the pytrends can't work. And I've written the line from 44th to 46rd as

try:
one_stock_data = []
# trend
trend = pytrends.build_payload(kw_list=[stock_name],timeframe='2013-12-29 2016-12-31', geo='TW')
time.sleep(random.randint(120, 360))

But it still can't work.
Pages: 1 2