Python Forum
first project read from xslx get only 1 column\row?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
first project read from xslx get only 1 column\row?
#1
Hello
I have xlsx file with 6 columns with 100 rows ~ .
the first thing I want to do is :
Read column A see the data , insert him to a list if it's not appear already
so I will have a list of all the names in this column


which package it will be best to use ?

Thanks,
Reply
#2
openpyxl will do this
PyPi page: https://pypi.org/project/openpyxl/
Docs: https://openpyxl.readthedocs.io/en/stable/:
install with pip install openpyxl from command line.
Reply
#3
OK
I have start to work with this
now I have anoter question

it's seem that it take too much time
if I understand correct it load the all file that start to run on it
so I took another file with 1000000 lines
and get Error:
you have  1048574  lines of data
MemoryError

During handling of the above exception, another exception occurred:
is there any wayto fix this ?

this is what I have done:
def ReadTxt():
    PID_Size = 0
    Error = False
    xlsx_file = Path('SimData', 'C:\\Users\\Computer\\Desktop\\Data.xlsx')
    wb_obj = openpyxl.load_workbook(xlsx_file)
    sheet = wb_obj.active
    print('you have ', str(sheet.max_row), ' lines of data')
    for row in sheet.iter_rows(min_row=None):
        PID_Name = row[0].value # this is where the PID located - first place
        try:
            PID_Size = len(PID_Name)
        except TypeError as e:
            print(PID_Name)
          #  print(e)
            PID_name = str(PID_Name)
            PID_Size = len(str(PID_Name))
            Error = True
        if not Error:
            if PID_Name not in str(PID_LIST) and PID_Size > 4:
                data = ""
                for cell in row:
                    data = (data + ":" + str(cell.value))
                data = data[1:]
                index = data.find(':')
                # print (index)
                data = data[0:index+1] + "    " + data[index+5:]
                PID_LIST.append(data)

    for pid in PID_LIST:
        print(pid)
and this is the resualt I'm getting (which is what I wanted)
18FC08F4:    D0:2:3D:28:61:26:18:A9
18F3C999:    0:0:0:0:0:FF:5A:0
18FC07F4:    B0:13:B0:13:77:1:48:36
18FC1621:    3:C7:0:0:D1:2:FF:E0
0CFE5A2F:    11:0:0:0:D3:F5:FF:F0
18FC1721:    F8:2A:20:4E:FF:FF:FE:EE
18FE592F:    D2:7C:83:7C:C9:7C:EA:7C
0C000027:    23:FF:FF:F3:FF:FF:FF:F3
.
.
.
.
Reply
#4
I have used this successfully for quite a while, but never with so much data.
I can only suggest that you try some of the other packages available.
Here's what available: https://pypi.org/search/?q=xslx&o=-zscore
This shows all packages having anything to do with xslx, so a lot of it won't be appropriate for your application.
I wish I could help more, but haven't tried any of these.

Perhaps others will have a suggestion for you.
Reply
#5
You could try to convert the xlsx file to csv through external command. Example with libreoffice:
Output:
soffice -env:UserInstallation=file:///tmp/test --headless --convert-to csv --outdir ~/tmp foo.xlsx
This creates file ~/tmp/foo.csv
Reply
#6
ok
and CVS is much easier to work with ? (I'm not familiar with all excel option at all)
it will take less memory ?

Thanks ,
Reply
#7
Well, the csv reader in the standard csv module processes the file line by line for example. You don't need to keep more than one line at a time in memory.
Reply
#8
Ok
What I will need to change in the code ?

Thanks,
Reply
#9
korenron Wrote:What I will need to change in the code ?
You wouldn't use openpyxl any more. Instead you would iterate on the rows of an object returned by a call to csv.reader() as shown in the documenrtation of the csv module.
Reply
#10
I will try it

Thank you for the help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  .JSON conversion to .CSV or .xslx tccompton 4 3,869 Mar-23-2018, 03:51 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020