Python Forum
Problem of getting XLS file data into python - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Problem of getting XLS file data into python (/thread-1669.html)



Problem of getting XLS file data into python - klllmmm - Jan-19-2017

[attachment=112][attachment=113]
I'm trying to get the data from 2 XLS files into python. These XLS files were exported files from SAP system.

However i can't get the data into python.

https://drive.google.com/open?id=0B6D2lP-sz4CZX3lPWUlYdHFMdHM

https://drive.google.com/open?id=0B6D2lP-sz4CZSEd0c0VoR054a1k

Method 1

import glob
import pandas as pd

path = "C:/Users/Desktop/MJ"
file_identifier = "*.XLS"

all_data = pd.DataFrame()
for f in glob.glob(path + "/*" + file_identifier):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
No data comes to python variable all_data

Method 2
Also i tried through xlrd

import xlrd

workbook = xlrd.open_workbook('C:/Users/Desktop/MJ/Apr.xls')
Error:
Traceback (most recent call last):   File "<ipython-input-14-9f59ece120e8>", line 2, in <module>     workbook = xlrd.open_workbook('C:/Users/Gishan/Desktop/MJ/Apr.xls')   File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\__init__.py", line 395, in open_workbook     with open(filename, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Gishan/Desktop/MJ/Apr.xls'
Can someone help me to solve this.


RE: Problem of getting XLS file data into python - buran - Jan-19-2017

Are you sure the path C:/Users/Desktop/MJ is correct?
I think between Users and Desktop should be the username that is logged in the system. i.e. C:/Users/<Some User Name>/Desktop/MJ assuming MJ is folder on the Desktop


RE: Problem of getting XLS file data into python - snippsat - Jan-19-2017

As mention bye @buran the path is wrong.
The file is also not good had to save in Excel(which complain that file is corrupt).
So save it to test.xlsx,now it work in xlrd(as you don't need pandas can read excel).
E.g pandas.read_excel()
>>> import pandas

>>> df = pandas.read_excel('test.xlsx')
>>> df.head()
  Unnamed: 0 Data statistics     Unnamed: 2  Number of   Unnamed: 4  \
0         NaN             NaN            NaN        NaN          NaN   
1         NaN  Records passed            NaN      3.079          NaN   
2         NaN             NaN            NaN        NaN          NaN   
3         NaN             NaN            NaN        NaN          NaN   
4         NaN  JOURNAL_NUMBER  DOCUMENT TYPE        NaN  POSTING KEY   

      Unnamed: 5   Unnamed: 6    Unnamed: 7          Unnamed: 8  \
0             NaN          NaN           NaN                 NaN   
1             NaN          NaN           NaN                 NaN   
2             NaN          NaN           NaN                 NaN   
3             NaN          NaN           NaN                 NaN   
4  GL_ACCT_NUMBER  DESCRIPTION  DEBIT/CREDIT              AMOUNT   

      Unnamed: 9 Unnamed: 10 Unnamed: 11  Unnamed: 12  
0             NaN         NaN         NaN          NaN  
1             NaN         NaN         NaN          NaN  
2             NaN         NaN         NaN          NaN  
3             NaN         NaN         NaN          NaN  
4  EFFECTIVE_DATE  ENTRY_DATE      PERIOD  PREPARER_ID  

>>> df['Data statistics'][:15]
0                NaN
1     Records passed
2                NaN
3                NaN
4     JOURNAL_NUMBER
5                NaN
6         1003001182
7         1003001182
8         1003001183
9         1003001183
10        1003001184
11        1003001184
12        1003001184
13        1003001184
14        1003001185
Name: Data statistics, dtype: object



RE: Problem of getting XLS file data into python - klllmmm - Jan-19-2017

Thanks for your help. pd.read_excel() works.

Now i have to find a way to convert/ save as xls files to xlsx. ( abt 100+ files)

Thank you very much.