Python Forum
pyreadstat write_sav inefficient
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pyreadstat write_sav inefficient
#1
Hello,

My colleagues work mostly in SPSS, so for their sake i export my dataframes to SPSS using the pyreadstat module (installed via conda, version 1.1.2).
Everything works fine, except that my exports are more than 7x as large as I'd expect them to be (in terms of file size).

A file that's supposed to be like 10MB exports as ~75MB.
When I open the file in SPSS, make whatever change and save them again, the file size shrinks to the expected 10MB.

Does anyone have an idea of what could cause this issue (and more important, how to resolve it)?
I assume it has to be something like specifying encoding='utf8', but write_sav doesn't have this parameter?

FYI - I export using this code
pyreadstat.write_sav(df, export_result_path)
I've already tried to specify the column size. For example all fields containing "ftq10a_" are variables with single integers, so I tried to add this explicitly:

integer_vars = [i for i in df.fields.tolist() if 'ftq10a_' in i] 
integer_format = ['F1.0'] * len(integer_vars)
formats = dict(zip(integer_vars, integer_format))
pyreadstat.write_sav(df, export_result_path, variable_format=formats)
Reply


Messages In This Thread
pyreadstat write_sav inefficient - by mikisDeWitte - Jun-21-2021, 08:32 AM
RE: pyreadstat write_sav inefficient - by buran - Jun-21-2021, 08:38 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Byte string catenation inefficient in 3.7? RMJFlack 13 5,677 Aug-18-2019, 05:19 AM
Last Post: RMJFlack

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020