Python Forum

Full Version: date validation
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I have started learning Python not far ago. I have a program that search for dates in a string (this part works ok) and than checks date correctness (e.g. February cannot have 30 days). I know that there is modul datetime but I wanted to check myself and do it on my own. Please note, that February has 29 days in leap years. Leap years are every year evenly divisible by 4, except for years evenly divisible by 100, unless the year is also evenly divisible by 400.

import re
 
dates = []
eachDate = []
text = '30/02/2000, 29/02/2000, 30/02/2100, 29/02/2100, 31/02/2004, 30/02/2004, 29/02/2004'
 
 
dateRegex = re.compile(r'''(
(0\d|1\d|2\d|30|31)   #day
(/)
(0\d|10|11|12)   #month
(/)
(1\d\d\d|2\d\d\d)
)''', re.VERBOSE) 
 
for groups in dateRegex.findall(text):
    eachDate = []
    eachDate.append(groups[1])
    eachDate.append(groups[3])
    eachDate.append(groups[5])
    dates.append(eachDate)
print(dates)
 
for item in dates:
    print(item)
    if item[1] in ('04', '06', '09', '11'):
        if item[0] == '31':
            dates.remove(item)
    elif item[1] == '02':
        if item[0] == '30':
            dates.remove(item)
        elif item[0] == '31':
            dates.remove(item)
        elif item[0] == '29':
            if int(item[2]) % 400 == 0:
                continue
            elif int(item[2]) % 100 == 0:
                dates.remove(item)
            elif int(item[2]) % 4 == 0:
                continue
            else:
                dates.remove(item)
 
print(dates)
results:
[['30', '02', '2000'], ['29', '02', '2000'], ['30', '02', '2100'], ['29', '02', '2100'], ['31', '02', '2004'], ['30', '02', '2004'], ['29', '02', '2004']]
['30', '02', '2000']
['30', '02', '2100']
['31', '02', '2004']
['29', '02', '2004']
[['29', '02', '2000'], ['29', '02', '2100'], ['30', '02', '2004'], ['29', '02', '2004']]

List dates is correct (7 elements), but for loop has only four elements (I add additional print, because I got confused). Final result is a list with 4 different dates (but it has mistakes e.g. ['30', '02', '2004']). What did I wrong?
To validate days in month:
Note (I wrote this without testing, so check for typos)
  • split date on '/'
  • create list with number of days in each month
    daymon = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
  • check if leap year
        def is_leap(year):
            retval = 0
            if ((not year % 400) or ((not year % 4) and (year % 100))):
                retval = 1
            return retval
        
  • n = int(month) - 1
  •    days_in_month = daymon[n]
       if n == 1:
           days_in_month = daymon[n] + is_leap(year)
       
This is a good use for datetime.datetime.strptime. Everything, what you try so solve manually, is already implemented in datetime module.

But solving it manually is a good way to learn more about our crazy calendar.
A better regex for your task:

import regex


day_month_year = re.compile(r'^(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})$')
year_month_day = re.compile(r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$')
The objects day_month_year and year_month_day are compiled regex objects.
They have methods like group(), groups() and groupdict()
You can give the groups names and the groupdict could be used to get the information by keys (a dict).

Using the compiled regex:

match = day_month_year.search('01/01/2038')
# match could be a re.Match or None, you've to check it
# if nothing was found, re.search return None.

if match:
    print('Found a match')
    print(match.groupdict())
else:
    print('Nothing found')
The values are still str, they must be cast to int for calculation.
With Exception handling you could handle the case, if the date_string was not valid.
import re                                                                                 
                                                                                          
                                                                                          
def get_date(date_string, regex):                                                         
    match = regex.search(date_string)                                                     
    if match:                                                                             
        return int(match['year']), int(match['month']), int(match['day'])                 
    raise ValueError(f'date_string {date_string} is invalid')                             
                                                                                          
                                                                                          
                                                                                          
day_month_year_regex = re.compile(r'^(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})$')   
year_month_day_regex = re.compile(r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$')   
                                                                                          
                                                                                          
try:                                                                                      
    year, day, month = get_date('2038-10-01', year_month_day_regex)                       
except ValueError as error:                                                               
    print(error)                                                                          
else:                                                                                     
    print(year, month, day)                                                               
                                                                                          
try:                                                                                      
    year, day, month = get_date('01/10/2038', day_month_year_regex)                       
except ValueError as error:                                                               
    print(error)                                                                          
else:                                                                                     
    print(year, month, day)                                                               
BTW: To test regex, you could visit https://regex101.com/

PS: The regex I used is very strict. White space before or after the date results into a Exception.
Quote: I know that there is modul datetime but I wanted to check myself and do it on my own.
OP specifically states they don't want to use datetime.