Insert missing data in a dataframe

amdi40 · (This post was last modified: Jan-17-2022, 02:40 PM by amdi40.)

Thanks for the replies
it is not that i am lazy, im just retarded
But the code so far is

import datetime
import os
from datetime import timedelta
from pathlib import Path

import pandas as pd


# dt=datetime.timedelta(minutes=2)
def convert_file():
    # Set start directory same as script
    os.chdir(os.path.abspath(os.path.dirname(__file__)))

    infile = Path('.') / 'gauga20211_19790101-20120101.km2'
    outfile = Path('.') / 'newfile.csv'

    with infile.open() as fp, outfile.open('w') as fout:
        startdate = None
        # starttime = None
        nexttime = 0
        dt = 0
        for line in fp:
            line = line.strip().split()
            # extract header
            if line[0] == '1':
                startdate = pd.to_datetime(line[1]+line[2], format='%Y%m%d%H%M')
                nexttime = startdate + datetime.timedelta(minutes=2)
            else:
                for item in line:
                    # Slet dt og heae
                    data = f"{nexttime},{item}\n"
                    fout.write(data)
                    nexttime += timedelta(minutes=2)


if __name__ == '__main__':
    convert_file()

data = pd.read_csv('newfile.csv')

data.to_csv('newfile1.csv', header=['time', 'intensity'], index=False, sep=',')

As the original file is constructed as
1 19790111 1007 20211 43 1 2.8 Header with start time for rain event
3.333 3.333 3.333 0.556 0.556 0.556 0.556 0.556 0.556 3.333 rain intensities measured with a interval of two minutes
0.370 0.370 0.370 0.370 0.370 0.370 0.370 0.370 0.370 0.476
0.476 0.476 0.476 0.476 0.476 0.476 6.667 1.667 1.667 6.667
0.667 0.667 0.667 0.667 0.667 0.417 0.417 0.417 0.417 0.417
0.417 0.417 0.417
1 19790125 1208 20211 30 1 3.0 1 Header with start time for new rain event
3.333 0.833 0.833 0.833 0.833 3.333 1.667 1.667 1.667 1.667
1.111 1.111 1.111 1.667 1.667 0.833 0.833 0.833 4.167 3.333
3.333 3.333 3.333 1.111 1.111 1.111 0.833 0.833 0.833 0.833

Larz60+ was a great help with that script!
But the problem still remains, that I need a way to insert the times where it does not rain. My thought was then the create a file with a time interval of 1 minute, and insert rain intensities of 0:

df1 = pd.read_csv('newfile1.csv')
d=df1.time
timerange= pd.date_range(first_line, periods=minutes, freq='1min')
df2=pd.DataFrame()
df2['time']= timerange
df2['intensity']='0'
df2.to_csv('zeroserie.csv', sep=',', index=False, header=['time', 'intensity'])

This script gives me:

Output:time,intensity
1979-03-23 09:11:00,0
1979-03-23 09:12:00,0
1979-03-23 09:13:00,0
1979-03-23 09:14:00,0
.......

I then want to merge this file with the old file, and remove duplicates:

df3=pd.concat([df1,df2]).drop_duplicates().reset_index(drop=True)

df3.to_csv('Newfile25.csv',sep=',', index=False, header=['time', 'intensity'])

This does however not work as intended...
The next step would then be to delete lines with rain intensities equal to 0 if these were within a time of 2< minutes of rain intensities >0

As the time can be both even and uneven, im not sure how to do it.. :/

Insert missing data in a dataframe

User Panel Messages

Announcements