Python Forum
Invalid Date Format fo Cached Files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Invalid Date Format fo Cached Files
#1
In this python code I am attempting to extract files from a FTP folder and then search through them for an invoice#. I have the command window open when running and keep getting stuck on the part where the code is trying to strip the date out of the file name, I get the error "Invalid Date Format" and it states it is skipping that file. The folder contains a ton of files and I really only need to look at the last 30 days worth of files. We have another python code that is doing something similar but with a different file name layout and I used that code to try to replicate it in this instance. This is what the files look like on the ftp folder, attached. This is the beginning of the code that I am getting stuck on will see about the search part once I figure this out;
import os
import datetime
from ftplib import FTP
import csv
from tkinter import *
from tkinter import messagebox
import tkinter as tk

# FTP and path configurations
outputPath = r'\\xxxxxx\xxxxxxxxxx\SanMar Invoice'
cacheDir = 'C:/temp/SanMarRoi/cache'
cacheSize = 60
ftp_site = "xxxxxxxxx"
ftp_username = "xxxxxxxx"
ftp_password = "xxxxxxxx"

# Ensure directories exist
if not os.path.exists(cacheDir):
    os.makedirs(cacheDir)
if not os.path.exists(outputPath):
    os.makedirs(outputPath)

rows = []
invoice = set()
ponumber = ""

# Function to fetch files from FTP and sync cache
def sync_cache():
    ftp = FTP(ftp_site)
    ftp.login(user=ftp_username, passwd=ftp_password)
    ftp.cwd("Outbound")

    filenames = []
    ftp.retrlines("LIST", lambda line: filenames.append(line.split()[-1]))
    
    print("Files retrieved from FTP server:")
    for file in filenames:
        print(file)

    # Get current date
    now = datetime.datetime.now()
    
    valid_filenames = []
    for file in filenames:
        try:
            # Extract the date from the filename
            file_date_str = file.split('-')[-1].split('.')[0]  # Get the last part of the filename and remove extension
            file_date = datetime.datetime.strptime(file_date_str, "%m-%d-%y")
            valid_filenames.append((file, file_date))
        except (ValueError, IndexError):
            # If parsing fails, skip the file
            print(f"Skipping file {file}: Invalid date format")

    # Filter to include only files from the last 30 days
    recent_files = [file for file in valid_filenames if (now - file[1]).days <= 30]
    
    print("Recent files from the last 30 days:")
    for file in recent_files:
        print(file[0])

    recent_files.sort(key=lambda filename: filename[1], reverse=True)

    print("Syncing cache. Please wait...")
    for i, (filename, _) in enumerate(recent_files):
        if i == cacheSize:
            break
        local_path = os.path.join(cacheDir, filename)
        if os.path.exists(local_path):
            print(f"File already in cache: {filename}")
            continue
        with open(local_path, "w") as cacheFile:
            ftp.retrbinary(f"RETR {filename}", lambda data: cacheFile.write(data.decode("utf-8")))
        print(f"Downloaded and cached file: {filename}")
    print("Cache sync complete")
    return [file[0] for file in recent_files]

Attached Files

Thumbnail(s)
   
Reply
#2
"64727_Inventory_Details-05-02-24.txt". split("-") returns ["64727_Inventory_Details", "05", "02", "24.txt"]. Probably not what you expect, but easily learned if you think about it.

I would start solving this problem like this:
filename = "64727_Inventory_Details-05-02-24.txt"
print(filename.split("-"))
Output:
['64727_Inventory_Details', '05', '02', '24.txt']
This little program demonstrates why your approach will not work.

You could just do the first split.
filename = "64727_Inventory_Details-05-02-24.txt"
print(filename.split("-", maxsplit=1))
Output:
['64727_Inventory_Details', '05-02-24.txt']
Your original code should have been written like this:
from datetime import datetime

filenames = (
    "64727_Inventory_Details-05-02-24.txt",
    "64728_Inventory_Details-05-04-24.txt",
    "64728_Inventory_Details.txt",
)

valid_files = []
for name in filenames:
    try:
        datestr = name.split("-", maxsplit=1)[1].split(".")[0]
        valid_files.append((datetime.strptime(datestr, "%m-%d-%y").date(), name))
    except (ValueError, IndexError):
        pass
print(*valid_files, sep="\n")
Output:
(datetime.date(2024, 5, 2), '64727_Inventory_Details-05-02-24.txt') (datetime.date(2024, 5, 4), '64728_Inventory_Details-05-04-24.txt')
Notice how my example includes sample filenames. If you want help, make it easy for others to help you. Don't make them transcribe filenames from a fuzzy screenshot.

This approach works, but I think I'd try a different approach using pattern matching.
import re
from datetime import datetime


filenames = (
    "64727_Inventory_Details-05-02-24.txt",
    "64728_Inventory_Details-05-04-24.txt",
    "64728_Inventory_Details.txt",
)
pattern = ".*(\d+-\d+-\d+).txt"
valid_files = []
for name in filenames:
    try:
        datestr = re.match(pattern, name).group(1)
        valid_files.append((datetime.strptime(datestr, "%m-%d-%y").date(), name))
    except (ValueError, AttributeError):
        pass
print(*valid_files, sep="\n")
Results are the same.
jland47 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Compare current date on calendar with date format file name Fioravanti 1 428 Mar-26-2024, 08:23 AM
Last Post: Pedroski55
  Python date format changes to date & time 1418 4 863 Jan-20-2024, 04:45 AM
Last Post: 1418
  Downloading time zone aware files, getting wrong files(by date))s tester_V 9 1,296 Jul-23-2023, 08:32 AM
Last Post: deanhystad
  Modifying a date format jehoshua 17 3,467 Oct-29-2022, 08:44 PM
Last Post: jehoshua
  Invalid format specifier Error Led_Zeppelin 2 8,340 Jul-11-2022, 03:55 PM
Last Post: Led_Zeppelin
  Date format error getting weekday value Aggie64 2 1,519 May-29-2022, 07:04 PM
Last Post: Aggie64
  Convert Date to another format lonesoac0 2 1,773 Mar-17-2022, 11:26 AM
Last Post: DeaD_EyE
  Format SAS DATE Racer_x 0 1,070 Feb-09-2022, 04:44 PM
Last Post: Racer_x
  How can I compare 2 format of date? korenron 4 1,661 Dec-21-2021, 12:40 PM
Last Post: korenron
  Date format and past date check function Turtle 5 4,650 Oct-22-2021, 09:45 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020