##### Listing files with glob.
 Listing files with glob. MathCommander Programmer named Tim Posts: 7 Threads: 2 Joined: Jul 2020 Reputation: Aug-10-2020, 02:47 PM (This post was last modified: Aug-11-2020, 07:58 PM by MathCommander.) Hi. I have a directory wich contains a set of files with the following format "name" + "six digits number" + ".txt" The following names are examples of this format dog000001.txt cat000054.txt lion010101.txt mouse123456.txt I have to make a list of all the files whose end number belongs to an interval. To get this, I have used the following instruction `files_list = glob.glob('*' + '[' + initial + '-' + final + ']' + '.txt')`where initial and final are the left and right boundaries of the interval, respectively. For example, if I execute the before instruction with `initial = '000000'``final = '010000'`the execution would return to me the following result `files_list = [dog000001.txt, cat000054.txt]`but it doesn't occur. I have tried using ranges too but the range doesn't consider 6 digits if the number has less than 6 digits. Can anybody help me? Thanks. Reply deanhystad So-and-so of the Yard Posts: 2,695 Threads: 12 Joined: Feb 2020 Reputation: Aug-10-2020, 02:55 PM What is your pattern? I would create the pattern as a string and print it so that I could verify I got the pattern correct. Independently I might test the logic that uses the pattern. Testing both at the same time makes it difficult to zoom in on the problem. Reply MathCommander Programmer named Tim Posts: 7 Threads: 2 Joined: Jul 2020 Reputation: Aug-10-2020, 03:39 PM The pattern is dependent of the variables initial and final. For the initial and final that I showed before the glob sentence would be `files_list = glob.glob('*[000000-010000].txt')`I know what is wrong is the pattern because [000000-010000] doesn't work as a range, but I don't know how to do it otherwise. Reply deanhystad So-and-so of the Yard Posts: 2,695 Threads: 12 Joined: Feb 2020 Reputation: Aug-10-2020, 04:43 PM You need to provide glob.glob with a path name and a pattern (and optional recursive flag). Where are those? I see only 1 string. If you had tested glob using a simpler patter, say 'test.txt' where you knew there was a test.txt file, you probably would have found this problem yourself. If you don't know why something doesn't work the first thing to do is determine what could be causing it to fail. You didn't know if your pattern was wrong or if you were using glob wrong. You are using glob wrong. I don't know about the pattern. Reply MathCommander Programmer named Tim Posts: 7 Threads: 2 Joined: Jul 2020 Reputation: Aug-10-2020, 04:59 PM (This post was last modified: Aug-10-2020, 04:59 PM by MathCommander.) I disagree. My glob sentence has only one argument because before of this sentence I have execute a os.chdir sentence to change the current working directory to the correct path. I know that I used succesfully the glob sentence because I have used it with other patterns and it worked well. The problem is the pattern, wich doesn't match the correct files. Does anyone know what pattern I should use to achieve my goal? Reply bowlofred Da Bishop Posts: 1,381 Threads: 3 Joined: Mar 2020 Reputation: Aug-10-2020, 06:22 PM This is tricky to do as a glob. I'd prefer instead to os.listdir() the files and pass through a regex. ```import re import os pattern = re.compile(r"\d{6}.txt\$") files = [x for x in os.listdir() if pattern.search(x)] print(files)``` Reply deanhystad So-and-so of the Yard Posts: 2,695 Threads: 12 Joined: Feb 2020 Reputation: Aug-10-2020, 06:44 PM (This post was last modified: Aug-10-2020, 06:44 PM by deanhystad.) Aaah, pattern is only for recursive search. This is like doing globbing in the bash shell. Should make it easy to test. ls pattern, I don't think you can do what you want to do, but you can do something close. The pattern is character based and it is not going to understand integers at all. But you can specify patterns that will match an integer range, just not any range. If you wanted to match 10 through 20 your pattern would be '*000[1-2][0-9].txt\$'. This says match any number of characters followed by 000 then a 1 or 2, then any number 0-9 and ending with .txt. I do not see how I could make my range 1..15 though, or 15..25 where part of the match depends on how the previous match was achieved. If my range is 10..20, the first digit has to be 1 or 2, but the range of the second digit is 0-9 if the first digit is 1, but only 0-5 if the first digit is 2. I think this is easier solved by parsing the filenames yourself. Extract the part of the filename that is digits, convert to an integer and compare to your range. You could use glob to limit the files you test to only those that match your naming patter '*[0-9][0-9][0-9][0-9][0-9].txt\$' Reply DeaD_EyE Da Bishop Posts: 1,713 Threads: 6 Joined: May 2017 Reputation: Aug-10-2020, 11:05 PM (This post was last modified: Aug-10-2020, 11:05 PM by DeaD_EyE.) Here an example how you can write a program, which starts with 2 lines and then .... ah let's improve it, enhance it, ... But the easiest approach is: First filter coarsely, then filter finely. `pathlib.Path.glob("*.txt")` or `glob.glob("*.txt")` to filter coarsely. `pathlib.Path.rglob("*.txt")` or `glob.glob("*.txt", recursive=True)` to filter coarsely recursive. The `glob` and `rglob` returns a generator, which yields `pathlib.Path` objects. The `glob.glob` returns a `list` with `str`. It's not a generator, just a normal function. The iterative function in `glob` is `iglob`. I am not sure how far they implemented the fnmatch, but often it's not powerful enough. Then use a regex to filter each finding finely. Visit https://regex101.com/ to test your regex against a string or multiline string. There is also a explanation what the single characters are doing. For example, `\w` matches `a-zA-Z0-9_` and \w+ matches one or more from this set. `\d` matches the digits `0-9` and `\d{6}` matches exactly 6 digits. Putting parenthesis around the character groups will group them. Using the regex `(\w+)(\d{6})` will also match: anything99999999123456.txt The first group will be: anything99999999 The second group will be: 123456 If you want to prevent this, you can use a range of character in square brackets. Also here the + after the bracket means, that this will match one or more. The `^` at the start marks the start of a str. This prevents the regex to shift to right side, until it matches. The `\$` marks the end of the `str`. This will prevent the regex to reach the end of the str after 6 numbers. The regex: `^([a-zA-Z])+(\d{6})\$` The `\.txt` is not included. Instead, you can use `Path.stem`, which is only the file name as a `str`. If you use `glob.glob`, then you can't do this. The classic safest way to split the last suffix from a path with low level api: ```import os path, file = os.path.split("/home/deadeye/file.txt.foo.bar.gz.py") name, suffix = os.path.splitext(file) print(path) print(file) print(name) print(suffix)```And with pathlib: ```from pathlib import Path my_path = Path("/home/deadeye/file.txt.foo.bar.gz.py") print(my_path.name) print(my_path.suffix) print(my_path.suffixes) print(my_path.parent)```Pattern like this: `\w+.txt` matches also `some123Word&txt` To prevent this, the `.` must be escaped with `\`. But the simple rule is, use glob to filter by extension, use regex for complex tasks and constructs like [0-9][0-9][0-9][0-9][0-9] won't help you. The wildcard destroys everything and is the same as the weak regex. Often a bad regex leads into security issues. Here an example. Hopefully, the regex is now right. The type annotation stuff is not required to understand. The type annotations have no influence of runtime, but they can help linters or IDEs to display problems with types. Python is a very type safe. Knowing the return type and the argument types what a function takes, is the half work. ```#!/usr/bin/env python3 """ This program find files by following pattern: > dog000001.txt > cat000054.txt > lion010101.txt > mouse123456.txt Visit for more info: https://python-forum.io/Thread-Listing-files-with-glob The files are returned in sorted order by name and then by integer. By the way, it's totally overengineered and you should not use this. """ import re import sys import json from argparse import ArgumentParser, Namespace from pathlib import Path from typing import Union, Optional, Generator, Callable, List, Tuple, Any PATH = Union[str, Path] PATH_GLOB = Generator[Path, None, None] FIND_RESULT = Tuple[str, int, Path] FIND_RESULTS = List[FIND_RESULT] SORT_FUNC = Optional[Callable[[FIND_RESULT], Tuple[Any]]] REGEX = re.compile(r"^([a-zA-Z])+(\d{6})\$") def get_txt(root: PATH) -> PATH_GLOB: """ :param root: Path where to find the txt files :return: A generator which yields Path objects """ yield from Path(root).glob("*.txt") def find(root: PATH, sort_func: SORT_FUNC = None) -> FIND_RESULTS: """ :param root: Path where to find the txt files :param sort_func: Sort Function which takes a FIND_RESULT :return: A generator which yields FIND_RESULT """ regex = REGEX files: FIND_RESULTS = [] for file in get_txt(root): if match := regex.search(file.stem): name = match.group(1) number = int(match.group(2)) files.append((name, number, file)) files.sort(key=sort_func) return files def by_number(result: FIND_RESULT) -> Tuple[int, str]: """ Key function to sort results by number and then by name :param result: :return: """ return result[1], result[0] def print_results(results: FIND_RESULTS, count: bool = False) -> None: """ Print results to stdout :param results: The results from find :param count: Print the count of matches at the end :return: """ for *_, file in results: print(file) if count: print(f"{len(results)} matching files found.") # def print_json_stream(results: FIND_RESULTS) -> None: # """ # DoNotUseThisCode # otherwise implement escaping # """ # count = len(results) # header = f'{{"count":{count},"results":[' # print(header, end="", flush=True) # for index, (*_, file) in enumerate(results, 1): # time.sleep(2) # print(f'"{file}"', flush=True, end="") # if index < count: # print(",", flush=True, end="") # print("]}") def print_json(results: FIND_RESULTS) -> None: """ Print the results as json to stdout :param results: :return: """ files = [str(file) for *_, file in results] result = { "count": len(files), "results": files, } print(json.dumps(result)) def get_args() -> Namespace: """ Get arguments :return: parsed arguments """ desc = f""" This example program finds txt files and uses the internal specified regex to match them. The following regex is used: {REGEX.pattern} """ parser = ArgumentParser(description=desc) # noinspection PyTypeChecker parser.add_argument("root", type=Path, help="Root path where to search the files.") parser.add_argument( "-n", action="store_const", const=by_number, default=None, help="Sort first by numeric value and then by name.", ) parser.add_argument( "-c", action="store_true", help="Show at the end the file count." ) parser.add_argument("-j", action="store_true", help="Json stream output") arguments = parser.parse_args() if not arguments.root.exists(): raise FileNotFoundError("Path does not exist.") if not arguments.root.is_dir(): raise FileNotFoundError("Path is not a directory.") return arguments if __name__ == "__main__": try: args = get_args() except FileNotFoundError as e: print(e, file=sys.stderr) sys.exit(1) if args.j: print_json(find(args.root, args.n)) else: print_results(results=find(args.root, args.n), count=args.c)``` Almost dead, but too lazy to die: https://sourceserver.info All humans together. We don't need politicians! Reply Posts: 5,879 Threads: 113 Joined: Sep 2016 Reputation: Aug-12-2020, 08:20 AM (This post was last modified: Aug-12-2020, 08:20 AM by snippsat.) Nice write-up @DeaD_EyE even if it's a little over-engineered Quote:Does anyone know what pattern I should use to achieve my goal? This task can be a little tricky,can have look at it and for more advance regex stuff i do not use try to fit it in with glob and fnmatch combo. Read with os.scandir() which is more modern way than using `listdir()`. So as @deanhystad suggests i extract the part that's digits then to `int()` to get clean number without zero. Use a range regex to find all number that match 0-1000,then find index where they are in file list. ```import os import re def read(path): file_lst = [] with os.scandir(path) as it: for file in it: if file.name.endswith('.txt') and file.is_file(): file_lst.append(file.name) return file_lst def find_files(file_lst): pattern = re.compile(r"^(1000|[1-9]?[0-9]?[0-9])\$") numbers = [str(int(re.search(r'\d+', i).group())) for i in file_lst] range_match = list(filter(None, [pattern.search(i) for i in numbers])) range_match = [i.group() for i in range_match] index_match = [i for i, val in enumerate(numbers) if val in range_match] for index, file in enumerate(file_lst): if index in index_match: yield file if __name__ == '__main__': path = r'G:\div_code\div' file_lst = read(path) found_files = find_files(file_lst) for file in found_files: print(file)```Files in folder: ```>>> file_lst ['cat000054.txt', 'dog000000007000.txt', 'dog000001.txt', 'dog000999.txt', 'dog001002.txt', 'dog77.txt', 'lion00000005000.txt', 'lion010000.txt', 'lion010101.txt', 'lion5.txt', 'mouse000888.txt', 'mouse100090 .txt', 'mouse123456.txt']```Output: ```cat000054.txt dog000001.txt dog000999.txt dog77.txt lion5.txt mouse000888.txt``` Reply MathCommander Programmer named Tim Posts: 7 Threads: 2 Joined: Jul 2020 Reputation: Oct-26-2020, 02:04 AM Ok, thank you all for the responses. I couldn't do it with glob so I filtered the files inside a loop. Thanks for the help! Reply

 Possibly Related Threads… Thread Author Replies Views Last Post |SOLVED] Glob JPGs, read EXIF, update file timestamp? Winfried 5 647 Oct-21-2021, 03:29 AM Last Post: buran [SOLVED] Input parameter: Single file or glob? Winfried 0 460 Sep-10-2021, 11:54 AM Last Post: Winfried q re glob.iglob iterator and close jimr 2 633 Aug-23-2021, 10:14 PM Last Post: perfringo Listing All Methods Of Associated With A Class JoeDainton123 3 1,053 May-10-2021, 01:46 AM Last Post: deanhystad Listing data from a list ebolisa 1 846 Sep-29-2020, 02:24 PM Last Post: DeaD_EyE Listing Attributes of Objects & Classes JoeDainton123 4 1,132 Aug-28-2020, 05:27 AM Last Post: ndc85430 Listing groups tharpa 2 1,368 Nov-26-2019, 07:25 AM Last Post: DeaD_EyE Version of glob for that Supports Windows Wildcards? Reverend_Jim 5 3,268 Jun-18-2019, 06:31 PM Last Post: Reverend_Jim AWS lambda script help - Listing untagged volumes jutler 0 1,396 Feb-13-2019, 02:36 PM Last Post: jutler nested for loops glob devenuro 3 3,212 Sep-20-2018, 09:54 PM Last Post: ODIS

Forum Jump:

### User Panel Messages

##### Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020