Python Forum

Full Version: Text files
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, is it OK for my PC when my python code works with 200 000 small text files? My python program opens every second new text file writes there one number and then closes it and then again and again. I am relatively new to programming so I am not sure if such high number of files is OK. Is it better to write everything to one bigger (but still nothing huge) file and loose clarity in data or is it OK to make 200 000 small text files and open/close/write into them in intervals which last only few milliseconds/seconds? Thanks for advice!
I would think having a different file for each datum would lose a lot of clarity in the data.

Each file requires a certain amount of overhead. 200,000 files are going to take up a lot more space than one file with 200,000 lines. Opening files also requires overhead, so your program is going to run a lot slower with 200,00 files than with one. One file is more efficient (up to a point).
What will happen with these files? What they are used for?
I´m gonna draw a pixel map from them. For my colleagues it would be more intuitive to use such organization of data even if it is 200 000 files. I have 600 groups each of them has 300 subgroups and each subgroup = one text file = has ca. 50 numbers inside (it means that I have exactly 180 000 .txt files). For example 600 countries, 300 biggest cities in each country, 50 citizens of each city. I work with numbers so for my colleagues it is different when I say USA - New York - Mr. Smith than 25 - 97 - 14578.789. Do you have any other idea how to organize such data set? I tried .json but they were not satisfied with it...
I would use a csv file. Columns for country, city, citizen. Easy to load into Python with the csv module, opens in Excel if they want to view the whole data set. At most have separate files for each country, but only so the files are easier for Excel to handle.
I don't understand how you access the data you write into files.

For my mind it easier to have nested dictionary (have O(1) complexity to access data) or some other datastructure.