Python Forum
pyreadstat write_sav inefficient
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pyreadstat write_sav inefficient
#1
Hello,

My colleagues work mostly in SPSS, so for their sake i export my dataframes to SPSS using the pyreadstat module (installed via conda, version 1.1.2).
Everything works fine, except that my exports are more than 7x as large as I'd expect them to be (in terms of file size).

A file that's supposed to be like 10MB exports as ~75MB.
When I open the file in SPSS, make whatever change and save them again, the file size shrinks to the expected 10MB.

Does anyone have an idea of what could cause this issue (and more important, how to resolve it)?
I assume it has to be something like specifying encoding='utf8', but write_sav doesn't have this parameter?

FYI - I export using this code
pyreadstat.write_sav(df, export_result_path)
I've already tried to specify the column size. For example all fields containing "ftq10a_" are variables with single integers, so I tried to add this explicitly:

integer_vars = [i for i in df.fields.tolist() if 'ftq10a_' in i] 
integer_format = ['F1.0'] * len(integer_vars)
formats = dict(zip(integer_vars, integer_format))
pyreadstat.write_sav(df, export_result_path, variable_format=formats)
Reply
#2
have a look at https://stackoverflow.com/questions/4717...nwrite-sav
Also you can try to pass compress=True and create zsav file.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
(Jun-21-2021, 08:38 AM)buran Wrote: have a look at https://stackoverflow.com/questions/4717...nwrite-sav
Also you can try to pass compress=True and create zsav file.

Thanks, that seems to be indeed the same issue, I hadn't seen that.
The solution was indeed to add compressed=true!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Byte string catenation inefficient in 3.7? RMJFlack 13 5,570 Aug-18-2019, 05:19 AM
Last Post: RMJFlack

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020