Hello
I have xlsx file with 6 columns with 100 rows ~ .
the first thing I want to do is :
Read column A see the data , insert him to a list if it's not appear already
so I will have a list of all the names in this column
which package it will be best to use ?
Thanks,
OK
I have start to work with this
now I have anoter question
it's seem that it take too much time
if I understand correct it load the all file that start to run on it
so I took another file with 1000000 lines
and get Error:
you have 1048574 lines of data
MemoryError
During handling of the above exception, another exception occurred:
is there any wayto fix this ?
this is what I have done:
def ReadTxt():
PID_Size = 0
Error = False
xlsx_file = Path('SimData', 'C:\\Users\\Computer\\Desktop\\Data.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file)
sheet = wb_obj.active
print('you have ', str(sheet.max_row), ' lines of data')
for row in sheet.iter_rows(min_row=None):
PID_Name = row[0].value # this is where the PID located - first place
try:
PID_Size = len(PID_Name)
except TypeError as e:
print(PID_Name)
# print(e)
PID_name = str(PID_Name)
PID_Size = len(str(PID_Name))
Error = True
if not Error:
if PID_Name not in str(PID_LIST) and PID_Size > 4:
data = ""
for cell in row:
data = (data + ":" + str(cell.value))
data = data[1:]
index = data.find(':')
# print (index)
data = data[0:index+1] + " " + data[index+5:]
PID_LIST.append(data)
for pid in PID_LIST:
print(pid)
and this is the resualt I'm getting (which is what I wanted)
18FC08F4: D0:2:3D:28:61:26:18:A9
18F3C999: 0:0:0:0:0:FF:5A:0
18FC07F4: B0:13:B0:13:77:1:48:36
18FC1621: 3:C7:0:0:D1:2:FF:E0
0CFE5A2F: 11:0:0:0:D3:F5:FF:F0
18FC1721: F8:2A:20:4E:FF:FF:FE:EE
18FE592F: D2:7C:83:7C:C9:7C:EA:7C
0C000027: 23:FF:FF:F3:FF:FF:FF:F3
.
.
.
.
I have used this successfully for quite a while, but never with so much data.
I can only suggest that you try some of the other packages available.
Here's what available:
https://pypi.org/search/?q=xslx&o=-zscore
This shows all packages having anything to do with xslx, so a lot of it won't be appropriate for your application.
I wish I could help more, but haven't tried any of these.
Perhaps others will have a suggestion for you.
You could try to convert the xlsx file to csv through external command. Example with libreoffice:
Output:
soffice -env:UserInstallation=file:///tmp/test --headless --convert-to csv --outdir ~/tmp foo.xlsx
This creates file ~/tmp/foo.csv
ok
and CVS is much easier to work with ? (I'm not familiar with all excel option at all)
it will take less memory ?
Thanks ,
Well, the csv reader in the standard
csv module processes the file line by line for example. You don't need to keep more than one line at a time in memory.
Ok
What I will need to change in the code ?
Thanks,
korenron Wrote:What I will need to change in the code ?
You wouldn't use openpyxl any more. Instead you would iterate on the rows of an object returned by a call to
csv.reader()
as shown in the documenrtation of the csv module.
I will try it
Thank you for the help!