Python Forum

First of all: I am not familiar with Python, so I am asking for help to fix the existing script. There is a Python script (Lambda function) for deleting files having time stamps from S3 bucket:
https://gist.github.com/pwilken/b548b589...9ebadae59f

However it is failing at:

Error:    [ERROR] IndexError: list index out of range
    Traceback (most recent call last):
      File "/var/task/lambda_function.py", line 34, in lambda_handler
        hour = dateArray[1].split(":")[0]

time stamp in S3 looks like: Jan 27, 2020 4:36:02 PM GMT+0000

so it is probably not expecting year here:

            # Start Policy for everything older then one week
            date = obj["Key"].replace('prefix_before_time', '')
            dateArray = date.split("T")
            date = dateArray[0]
            hour = dateArray[1].split(":")[0]
            print(f'  {date} - {hour}')

Do anyone know how to fix it?

(Jan-28-2020, 09:03 PM)localsystemuser Wrote: [ -> ]time stamp in S3 looks like: Jan 27, 2020 4:36:02 PM GMT+0000

Are you sure? That is not what the procedure is expecting. The date is split on the "T" sign. The part before that is considered a date. The part after is considered time.

dateArray = date.split("T")
date = dateArray[0]
hour = dateArray[1]

So I think the program expects a date in ISO 8601 format. Like this: "2020-01-28T20:32:25+00:00".

(Jan-28-2020, 09:42 PM)ibreeden Wrote: [ -> ]
(Jan-28-2020, 09:03 PM)localsystemuser Wrote: [ -> ]time stamp in S3 looks like: Jan 27, 2020 4:36:02 PM GMT+0000
Are you sure? That is not what the procedure is expecting. The date is split on the "T" sign. The part before that is considered a date. The part after is considered time.
dateArray = date.split("T")
date = dateArray[0]
hour = dateArray[1]
So I think the program expects a date in ISO 8601 format. Like this: "2020-01-28T20:32:25+00:00".

So changing T to blank space should fix the problem? I am asking because my Python knowledge is 0.

Well,

In aws cli, timestamp looks differently:

aws s3 ls bucketname
                           PRE folder/
2019-02-27 10:24:00        271 file
2019-06-06 14:41:47       2353 file2

I have replaced T with blank space, but still getting error:

Error:{
  "errorMessage": "list index out of range",
  "errorType": "IndexError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 34, in lambda_handler\n    hour = dateArray[1].split(\":\")[0]\n"
  ]
}

If this is the format: Jan 27, 2020 4:36:02 PM GMT+0000
This will be the format string for datetime: "%b %d, %Y %I:%M:%S %p %Z%z"

%b: Month as locale’s abbreviated name.
%d: Day of the month as a zero-padded decimal number.
%Y: Year with century as a decimal number.
%I: Hour (12-hour clock) as a zero-padded decimal number.
%M: Minute as a zero-padded decimal number.
%S: Second as a zero-padded decimal number.
%p: Locale’s equivalent of either AM or PM.
%Z: Time zone name (empty string if the object is naive).
%z: UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive).

The method datetime.strptime will convert the string into a datetime object. The format is documented.

If the format is iso8601, it's much easier, because there is a method datetime.datetime.fromisoformat which parses this format specification.

from datetime import datetime as dt

my_format = "%b %d, %Y %I:%M:%S %p %Z%z"
my_timestamp = "Jan 27, 2020 4:36:02 PM GMT+0000"

# now use strptime, which is the abbreviation for "string put time",
# which is used to create a datetime object from str
# datetime.datetime.strptime(your_date, your_format)


# strftime means "string format time"
# which is used to convert datetime to a str


my_date = dt.strptime(my_timestamp, my_format)

ISO8601 example:

from datetime import datetime as dt


my_timestamp = "2020-01-29T14:12:53.439493+01:00"
my_date = dt.fromisoformat(my_timestamp)

In both cases the offset from utc is known. The resulting datetime-object has the timezone offset included.

Paring the ISO8601 timestamp manually is also possible, but not required.

from datetime import datetime as dt


iso8601_fmt = '%Y-%m-%dT%H:%M:%S%z'
ts = "2020-01-28T20:32:25+10:00"

my_date = dt.strptime(ts, iso8601_fmt)

Don't forget, that there are two different types datetime objects.

datetime with timezone information
naive datetime without timezone information

You can't do operations between a naive datetime and a datetime object.
For example you have a datetime object created, to compare it with your timestamps:

from datetime import datetime as dt


not_before = dt(2020, 1, 10) # no timezone
my_timestamp = dt.fromisoformat("2020-01-28T20:32:25+10:00")

if my_timestamp < not_before:
    print(f'{my_timestamp} is before {not_before}')
else:
    print(f'{my_timestamp} is not before {not_before}')

Error:
TypeError: can't compare offset-naive and offset-aware datetimes

To fix this problem with external dependencies:

from datetime import datetime as dt

# pip install pendulum
import pendulum

my_timezone = pendulum.timezone('Europe/Berlin')

not_before = my_timezone.datetime(2020, 1, 10)
my_timestamp = dt.fromisoformat("2020-01-28T20:32:25+10:00")

if my_timestamp < not_before:
    print(f'{my_timestamp} is before {not_before}')
else:
    print(f'{my_timestamp} is not before {not_before}')

To handle timezone direct with datetime is a bit annoying.
Depending on you input, you can decide what do to.

localsystemuser

ibreeden

localsystemuser

DeaD_EyE