Python Forum
Pandas read csv file in 'date/time' chunks
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas read csv file in 'date/time' chunks
#1
I have written code to read a large time series data csv file (X million rows) using pandas read_csv() with chunking. That part of the code is working as expected, but unfortunately that is also a negative.

The problem is that I want to resample the data once it has been read, and because the data is being read in chunks, the resampling is working on a per chunk basis, rather than the file, so the boundaries of the chunked data can be different each time.

For instance, I have a file that contains minute data, and as there are 1440 minutes in a day, if I set the chunk size to 1440, in a perfect world, each chunk would contain data from 00:00 to 23:59. However, if there are minutes missing, reading 1440 rows would also end up reading all of the data for 1 day plus some data from the following day, and this is causing issues with the resampled data.

Is there a way to get pandas (or perhaps another library?), to read data one day/week/month at a time?

The only option I can currently think of is to split the large file into smaller day/week/month files, and then process those files without chunking.

I'm hoping there is a better solution to the one that I have thought of?
Reply
#2
I'd give a thought to a Pandas alternative. There are several, and when you run into a pandas limitation (or speed issue) take a look.

Polars - listen to the recent Talk Python To Me Podcast for some details (episode 402)

Vaex - supports up to a billion rows

Dask

PySpark - Python wrapper for Spark which is written in scala, supports large datasets and distributed computing.
MorganSamage likes this post
Reply
#3
Put you data in a "real" datbase and work from there
MorganSamage likes this post
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
(Feb-12-2023, 06:42 PM)jefsummers Wrote: I'd give a thought to a Pandas alternative. There are several, and when you run into a pandas limitation (or speed issue) take a look.

Polars - listen to the recent Talk Python To Me Podcast for some details (episode 402)

Vaex - supports up to a billion rows

Dask

PySpark - Python wrapper for Spark which is written in scala, supports large datasets and distributed computing.

Thanks for the feedback.

I'd already started coding up putting the data into a database and reading the data from there, but I will definitely look into your suggestions to see if they can do what I want in the future.
Reply
#5
(Feb-12-2023, 08:39 PM)buran Wrote: Put you data in a "real" datbase and work from there

I'd already starting coding this up when I got your reply (great minds think alike Big Grin ). The reason I didn't go down this route originally is that I have a time constraint, i.e. I can't run processes that take too long. However, I have now factored this in, so hopefully all will be well once I have completed the coding.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [Numpy] Load date/time from .txt to 'datetime64' type. water 4 631 Mar-01-2024, 11:16 PM
Last Post: Gribouillis
  Parsing and summing time deltas (duration) onto bar + pie charts using pandas - - DRY Drone4four 2 604 Feb-10-2024, 06:04 PM
Last Post: Drone4four
Smile How to further boost the data read write speed using pandas tjk9501 1 1,276 Nov-14-2022, 01:46 PM
Last Post: jefsummers
  How to import an xml file to Pandas sjhazard 0 2,369 Jun-08-2021, 08:19 PM
Last Post: sjhazard
  Does a pandas have a date without a time? AlekseyPython 6 4,965 Feb-10-2021, 09:24 AM
Last Post: Naheed
  Pandas - compute means per category and time rama27 7 3,540 Nov-13-2020, 08:55 AM
Last Post: PsyPy
Lightbulb Allocating maximum memory to chunks Robotguy 1 1,416 Oct-13-2020, 02:59 AM
Last Post: Larz60+
  Accessing details of chunks in HDF5 file Robotguy 0 1,577 Aug-29-2020, 06:51 AM
Last Post: Robotguy
  Can't read text file with pandas zinho 6 12,158 May-24-2020, 06:13 AM
Last Post: azajali43
  Read json array data by pandas vipinct 0 1,962 Apr-13-2020, 02:24 PM
Last Post: vipinct

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020