I think pandas could do it very easy. If you do it by hand, it's a little bit complicated.
But you can learn more about iterators, using functions to reuse code. I commented everything.
I have worked out for myself this localization and normalization stuff and struggled with exchanging
datetime between a plc and simulators, where someone has forgotten to save a date together with timezone.
So later we decided to define everything on serverside as UTC0. Data exchange expects this offset.
On client side the local timezone is used. In formats where you exchange data and the use of iso8601 is possbile,
then use it. Most languages can parse it and read also the offset of the timezone. The name itself is lost.
To convert naive dt to uct_dt to local_dt, you should follow this order:
If you use csv-data, you can use the
If you use split without an argument, all whitespace characters are used for delimiting and they are stripped away.
I can't memorize them.
Here some additional info about datetime and pytz.
Working with datetime objects and timezones in Python
But you can learn more about iterators, using functions to reuse code. I commented everything.
I have worked out for myself this localization and normalization stuff and struggled with exchanging
datetime between a plc and simulators, where someone has forgotten to save a date together with timezone.
So later we decided to define everything on serverside as UTC0. Data exchange expects this offset.
On client side the local timezone is used. In formats where you exchange data and the use of iso8601 is possbile,
then use it. Most languages can parse it and read also the offset of the timezone. The name itself is lost.
To convert naive dt to uct_dt to local_dt, you should follow this order:
- Parse the datetime string with strptime
- replace or localize it with utc
- pytz.timezone(...).normalize(utc_date_time) or utc_date_time.astimezone(pytz.timezone(...)) the first method is safer.
If you use csv-data, you can use the
csv
module.import csv with open('file') as fd: reader = csv.read(fd, delimiter=';') # reader is an iteratorBut it's not needed. You have used the
split
function with a whitespace as delimiter.If you use split without an argument, all whitespace characters are used for delimiting and they are stripped away.
print('Just a normal scentence.'.split()) print('Just a normal scentence.'.split(' '))
Output:['Just', 'a', 'normal', 'scentence.']
['Just', '', '', '', '', 'a', '', '', '', '', '', 'normal', '', '', '', '', '', '', '', 'scentence.']
Ok now the rest. I have taken your formatting of the date and time. Every time I have to read the specs about the formatting.I can't memorize them.
import datetime import pytz from itertools import islice import io TIMEZONE = pytz.timezone('Europe/Copenhagen') def parse_date(date, time, local_tz): # join date and time together, delimiter is a whitespace # put both formats togetherdef parse_date(date, time, local_tz): """ Expected date format: "%Y-%m-%d" Expected time format: "%H:%M:%S.%f" local_tz: timezone object returns: local date_time object """ dt_fmt = "%Y-%m-%d %H:%M:%S.%f" dt_str = ' '.join((date, time)) dt = datetime.datetime.strptime(dt_str, dt_fmt) utc_dt = pytz.utc.localize(dt) local_dt = local_tz.normalize(utc_dt) return local_dt dt_fmt = "%Y-%m-%d %H:%M:%S.%f" dt_str = ' '.join((date, time)) dt = datetime.datetime.strptime(dt_str, dt_fmt) # the resulting object is a naive datetime object, which means # it has no information about the timezome # pytz can localize, which means it takes the naive datetime # and set the timezone utc_dt = pytz.utc.localize(dt) utc_dt_by_replace = dt.replace(tzinfo=pytz.utc) # if not sure, you can check if utc_dt == utc_dt_by_replace: print('utc_dt == utc_dt_by_replace') # now we have a timezome aware datetime object with utc as timezone # just printing them later in one row local_dt = local_tz.normalize(utc_dt) # normalize is the opposite # it takes a timezone aware datetime # and converts it to the destination # timezone # debugging print :-) print(dt, utc_dt, utc_dt_by_replace, local_dt, local_dt.isoformat()) # best method for csv data is ISO8601 # the local_dt is a classical datetime object with timezone information # from pytz, which is just a database with an api for us which holds # the information about the timezones # https://en.wikipedia.org/wiki/ISO_8601 # at least since python 3.6 we got also datetime.datetime.fromisoformat() # so lets make the test: temp = local_dt.fromisoformat(local_dt.isoformat()) if local_dt == temp: print('Conversion back and forth with isoformat was successful and still holds the timezone offset') print(temp.tzinfo) # the information which is lost with this Conversion # is the political timezone name (EST, PST, CST...), the offset is still there # UTC is not a timezone, but this goes to deep. return local_dt def parse_bird(file_or_iterator, local_tz): """ A Generator which lazy returns date_time and movements """ for line in file_or_iterator: date, time, movements = line.split() date_time = parse_date(date, time, local_tz) yield date_time, movements def transpose(iterator): """ Read birds from iterator with global TIMEZONE and transpose the results. The iterator could be a file like object in text mode or another sequence which contains strings as elements for each line. """ return zip(*parse_bird(iterator, TIMEZONE)) # transpose def read_bird(file): """ Open the bird file, read the contents transpose them and return timestamps and movements """ with open(file) as fd: timestamps, movements = transpose(fd) return timestamps, movements def read_bird_lines(file, lines): # little trick with islice # if you want to test next time your data, # you can use islice to limit the iteration # to x lines, but you can also skip the first iteration # a open file is an iterator, iterating over the iterator # yields lines, even if the file was open in binary mode, # which is strange with open(file) as fd: iterator = islice(fd, 0, lines) timestamps, movements = transpose(fd) return timestamps, movements def read_bird_str(text): fake_file = io.StringIO(text) timestamps, movements = transpose(fake_file) return timestamps, movements testdata = """ 2010-01-01 01:01:00.0000 left 2010-11-01 01:01:00.0000 right 2010-10-01 01:01:00.0000 right """.strip() # file = 'bird_jan25jan16.txt' # timestamps, movementss = read_bird(file) ts, mvs = read_bird_str(testdata)parse_date without comments and print function:
def parse_date(date, time, local_tz): """ Expected date format: "%Y-%m-%d" Expected time format: "%H:%M:%S.%f" local_tz: timezone object returns: local date_time object """ dt_fmt = "%Y-%m-%d %H:%M:%S.%f" dt_str = ' '.join((date, time)) dt = datetime.datetime.strptime(dt_str, dt_fmt) utc_dt = pytz.utc.localize(dt) local_dt = local_tz.normalize(utc_dt) return local_dtTo show them:
import matplotlib.pyplot as plt plt.plot_date(ts, mvs) plt.show()
Here some additional info about datetime and pytz.
Working with datetime objects and timezones in Python
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!