Python Forum

Full Version: Problem of getting XLS file data into python
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
[attachment=112][attachment=113]
I'm trying to get the data from 2 XLS files into python. These XLS files were exported files from SAP system.

However i can't get the data into python.

https://drive.google.com/open?id=0B6D2lP...UlYdHFMdHM

https://drive.google.com/open?id=0B6D2lP...0VoR054a1k

Method 1

import glob
import pandas as pd

path = "C:/Users/Desktop/MJ"
file_identifier = "*.XLS"

all_data = pd.DataFrame()
for f in glob.glob(path + "/*" + file_identifier):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
No data comes to python variable all_data

Method 2
Also i tried through xlrd

import xlrd

workbook = xlrd.open_workbook('C:/Users/Desktop/MJ/Apr.xls')
Error:
Traceback (most recent call last):   File "<ipython-input-14-9f59ece120e8>", line 2, in <module>     workbook = xlrd.open_workbook('C:/Users/Gishan/Desktop/MJ/Apr.xls')   File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\__init__.py", line 395, in open_workbook     with open(filename, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Gishan/Desktop/MJ/Apr.xls'
Can someone help me to solve this.
Are you sure the path C:/Users/Desktop/MJ is correct?
I think between Users and Desktop should be the username that is logged in the system. i.e. C:/Users/<Some User Name>/Desktop/MJ assuming MJ is folder on the Desktop
As mention bye @buran the path is wrong.
The file is also not good had to save in Excel(which complain that file is corrupt).
So save it to test.xlsx,now it work in xlrd(as you don't need pandas can read excel).
E.g pandas.read_excel()
>>> import pandas

>>> df = pandas.read_excel('test.xlsx')
>>> df.head()
  Unnamed: 0 Data statistics     Unnamed: 2  Number of   Unnamed: 4  \
0         NaN             NaN            NaN        NaN          NaN   
1         NaN  Records passed            NaN      3.079          NaN   
2         NaN             NaN            NaN        NaN          NaN   
3         NaN             NaN            NaN        NaN          NaN   
4         NaN  JOURNAL_NUMBER  DOCUMENT TYPE        NaN  POSTING KEY   

      Unnamed: 5   Unnamed: 6    Unnamed: 7          Unnamed: 8  \
0             NaN          NaN           NaN                 NaN   
1             NaN          NaN           NaN                 NaN   
2             NaN          NaN           NaN                 NaN   
3             NaN          NaN           NaN                 NaN   
4  GL_ACCT_NUMBER  DESCRIPTION  DEBIT/CREDIT              AMOUNT   

      Unnamed: 9 Unnamed: 10 Unnamed: 11  Unnamed: 12  
0             NaN         NaN         NaN          NaN  
1             NaN         NaN         NaN          NaN  
2             NaN         NaN         NaN          NaN  
3             NaN         NaN         NaN          NaN  
4  EFFECTIVE_DATE  ENTRY_DATE      PERIOD  PREPARER_ID  

>>> df['Data statistics'][:15]
0                NaN
1     Records passed
2                NaN
3                NaN
4     JOURNAL_NUMBER
5                NaN
6         1003001182
7         1003001182
8         1003001183
9         1003001183
10        1003001184
11        1003001184
12        1003001184
13        1003001184
14        1003001185
Name: Data statistics, dtype: object
Thanks for your help. pd.read_excel() works.

Now i have to find a way to convert/ save as xls files to xlsx. ( abt 100+ files)

Thank you very much.