Posts: 24
Threads: 11
Joined: Apr 2019
Jul-14-2020, 10:35 AM
(This post was last modified: Jul-14-2020, 10:39 AM by NewBeie.)
I have this function;
def get_file_number(filename):
""" Get the file number from the file name
Finds the part between xml and .txt and converts that to a number"""
try:
file_number = re.findall('xml(.*).txt'.lower(), filename.lower())[0]
return file_number
except Exception: # Noqa
return -1 when I run it with this file 'test_file.xml20200714090702_01.txt' I get nothing back, however if I do it:
def get_file_number(filename):
""" Get the file number from the file name
Finds the part between xml and .txt and converts that to a number"""
file_number = re.findall('xml(.*?).txt'.lower(), filename.lower())[0]
print(file_number)
# try:
# file_number = re.findall('xml(.*?).txt'.lower(), filename.lower())[0]
# return file_number
# except Exception: # Noqa
# return -1 I am getting a value back
Output: 20200714090702_01
so I want to ask, what am I doing to in the try : except ?
Posts: 2,125
Threads: 11
Joined: May 2017
Jul-14-2020, 10:51 AM
(This post was last modified: Jul-14-2020, 10:53 AM by DeaD_EyE.)
You could make your function more generic.
def get_file_numbers(file_name):
return list(
map(
int,
re.findall(r"(\d+)", file_name)
)
)
file_numbers = get_file_numbers('test_file.xml20200714090702_01.txt')
print(file_numbers) Output: [20200714090702, 1]
By the way, it looks like the second value is an index and the first one is a datetime.
Maybe you do not want to have integers.
Posts: 2,168
Threads: 35
Joined: Sep 2016
Your first findall string does not have a '?' The second one does.
Don't capture every possible exception, capture a specific exception you want to handle.
Posts: 1,838
Threads: 2
Joined: Apr 2017
(Jul-14-2020, 10:35 AM)NewBeie Wrote: so I want to ask, what am I doing to in the try : except ?
I'm not sure the exception handling code is really the problem - the two implementations of your function are different: one prints the value of file_number and the other returns it. You haven't shown the code that calls the function, so for the latter case, are you printing the return value?
Posts: 24
Threads: 11
Joined: Apr 2019
Jul-14-2020, 12:18 PM
(This post was last modified: Jul-14-2020, 12:18 PM by NewBeie.)
(Jul-14-2020, 11:24 AM)Yoriz Wrote: Your first findall string does not have a '?' The second one does.
Don't capture every possible exception, capture a specific exception you want to handle.
even without ? nothing changes, it still doesn't return anything
(Jul-14-2020, 11:25 AM)ndc85430 Wrote: (Jul-14-2020, 10:35 AM)NewBeie Wrote: so I want to ask, what am I doing to in the try : except ?
I'm not sure the exception handling code is really the problem - the two implementations of your function are different: one prints the value of file_number and the other returns it. You haven't shown the code that calls the function, so for the latter case, are you printing the return value?
even without ? nothing changes, it still doesn't return anything.
filename = 'test_file.xml20200714090702_01.txt'
def get_file_number(filename):
""" Get the file number from the file name
Finds the part between xml and .txt and converts that to a number"""
# file_number = re.findall('xml(.*).txt'.lower(), filename.lower())[0]
# print(file_number)
try:
file_number = re.findall('xml(.*).txt'.lower(), filename.lower())[0]
return file_number
except Exception:
return -1
get_file_number(filename) and
filename = 'test_file.xml20200714090702_01.txt'
def get_file_number(filename):
""" Get the file number from the file name
Finds the part between xml and .txt and converts that to a number"""
file_number = re.findall('xml(.*).txt'.lower(), filename.lower())[0]
print(file_number)
# try:
# file_number = re.findall('xml(.*).txt'.lower(), filename.lower())[0]
# return file_number
# except Exception:
# return -1
get_file_number(filename) Yes I didn't show the code that calls this function, which doesn't matter if I'm not returning the correct value.
(Jul-14-2020, 10:51 AM)DeaD_EyE Wrote: You could make your function more generic.
def get_file_numbers(file_name):
return list(
map(
int,
re.findall(r"(\d+)", file_name)
)
)
file_numbers = get_file_numbers('test_file.xml20200714090702_01.txt')
print(file_numbers) Output: [20200714090702, 1]
By the way, it looks like the second value is an index and the first one is a datetime.
Maybe you do not want to have integers.
Thanks for this solution. but what if I have this as a file_name "test_claims_rca_0.xml20200714140759_01.txt", I get this output Output: [0, 20200714140759, 1]
It would be nice to return everything after .xml, so that I get always specify a [n] to get a file name
Posts: 2,125
Threads: 11
Joined: May 2017
You can improve your regex.
Use this, if you want only the first number-block which is a datetime.
re.findall('xml(\d+)_\d+\.txt', filename.lower()) or if you want to capture the second number after the underscore.
re.findall('xml(\d+)_(\d+)\.txt', filename.lower()) In addition, if you write a . as regex, this mean all chars.
To match a . you need to escape it: \.
To parse the datetime:
import re
import datetime
def get_file_numbers(file_name):
# get the first element from findall and assign it to timestamp
# get the rest of elements and assign them to numbers
# if there is no rest, numbers is an empty list
timestamp, *numbers = re.findall(r"(\d+)", file_name)
try:
# look into documentation of datetime.datetime, to know
# the format syntax.
dt = datetime.datetime.strptime(timestamp, "%Y%m%d%H%M%S")
except ValueError:
# if the date was not a date or has a different format it fails
# with an ValueError
# I convert instead the str to an int
dt = int(timestamp)
# return the dt and all elements from numbers as integers.
# the * in front of maps unpacks the iterable
return [dt, *map(int, numbers)]
# it works with all files, where minimum is one block with numbers
file_numbers = get_file_numbers('ffffffffffffffxxxxxx20200714090702_yyyyyyyyyy_-01-02')
print(file_numbers) Output: [datetime.datetime(2020, 7, 14, 9, 7, 2), 1, 2]
Posts: 1,838
Threads: 2
Joined: Apr 2017
Jul-14-2020, 01:33 PM
(This post was last modified: Jul-14-2020, 01:34 PM by ndc85430.)
(Jul-14-2020, 12:18 PM)NewBeie Wrote: even without ? nothing changes, it still doesn't return anything.
That just isn't true. Your function returns a value whether there is an exception or not, but you throw it away on line 15 in the code below. How would you expect to see any output without printing the return value?
Quote:filename = 'test_file.xml20200714090702_01.txt'
def get_file_number(filename):
""" Get the file number from the file name
Finds the part between xml and .txt and converts that to a number"""
# file_number = re.findall('xml(.*).txt'.lower(), filename.lower())[0]
# print(file_number)
try:
file_number = re.findall('xml(.*).txt'.lower(), filename.lower())[0]
return file_number
except Exception:
return -1
get_file_number(filename)
|