Jun-21-2021, 08:32 AM
(This post was last modified: Jun-21-2021, 08:32 AM by mikisDeWitte.)
Hello,
My colleagues work mostly in SPSS, so for their sake i export my dataframes to SPSS using the pyreadstat module (installed via conda, version 1.1.2).
Everything works fine, except that my exports are more than 7x as large as I'd expect them to be (in terms of file size).
A file that's supposed to be like 10MB exports as ~75MB.
When I open the file in SPSS, make whatever change and save them again, the file size shrinks to the expected 10MB.
Does anyone have an idea of what could cause this issue (and more important, how to resolve it)?
I assume it has to be something like specifying encoding='utf8', but write_sav doesn't have this parameter?
FYI - I export using this code
My colleagues work mostly in SPSS, so for their sake i export my dataframes to SPSS using the pyreadstat module (installed via conda, version 1.1.2).
Everything works fine, except that my exports are more than 7x as large as I'd expect them to be (in terms of file size).
A file that's supposed to be like 10MB exports as ~75MB.
When I open the file in SPSS, make whatever change and save them again, the file size shrinks to the expected 10MB.
Does anyone have an idea of what could cause this issue (and more important, how to resolve it)?
I assume it has to be something like specifying encoding='utf8', but write_sav doesn't have this parameter?
FYI - I export using this code
pyreadstat.write_sav(df, export_result_path)I've already tried to specify the column size. For example all fields containing "ftq10a_" are variables with single integers, so I tried to add this explicitly:
integer_vars = [i for i in df.fields.tolist() if 'ftq10a_' in i] integer_format = ['F1.0'] * len(integer_vars) formats = dict(zip(integer_vars, integer_format)) pyreadstat.write_sav(df, export_result_path, variable_format=formats)