pyreadstat write_sav inefficient

mikisDeWitte · (This post was last modified: Jun-21-2021, 08:32 AM by mikisDeWitte.)

Hello,

My colleagues work mostly in SPSS, so for their sake i export my dataframes to SPSS using the pyreadstat module (installed via conda, version 1.1.2).
Everything works fine, except that my exports are more than 7x as large as I'd expect them to be (in terms of file size).

A file that's supposed to be like 10MB exports as ~75MB.
When I open the file in SPSS, make whatever change and save them again, the file size shrinks to the expected 10MB.

Does anyone have an idea of what could cause this issue (and more important, how to resolve it)?
I assume it has to be something like specifying encoding='utf8', but write_sav doesn't have this parameter?

FYI - I export using this code

pyreadstat.write_sav(df, export_result_path)

I've already tried to specify the column size. For example all fields containing "ftq10a_" are variables with single integers, so I tried to add this explicitly:

integer_vars = [i for i in df.fields.tolist() if 'ftq10a_' in i] 
integer_format = ['F1.0'] * len(integer_vars)
formats = dict(zip(integer_vars, integer_format))
pyreadstat.write_sav(df, export_result_path, variable_format=formats)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Byte string catenation inefficient in 3.7?	RMJFlack	13	5,677	Aug-18-2019, 05:19 AM Last Post: RMJFlack

pyreadstat write_sav inefficient

User Panel Messages

Announcements