The problem is strange. I haven't reproduced it, instead I go a different way.
First of all, your regex is wrong.
Use regex101 to check it.
Instead of looking up the whole data in memory, you could use an iterative solution: Line by line
You could return a dict, or a list for each result.
This output is without converting the str to float.
If the file is 100 TiB big, you're still able to use this code,
because it doesn't load the whole content of the file into memory.
A nice side effect of an iterative solution.
The use of
For toy applications, it's ok.
With medium data (fits on disk, but not memory) you need an iterative solution.
First of all, your regex is wrong.
{LON=-\d\d.\d\d\d\d\d\d}{LAT=\d\d.\d\d\d\d\d\d}The corrected version:
{LON=([+-]?\d{2}\.\d{6})}{LAT=([+-]?\d{2}\.\d{6})}
- + or - or no sign in front of the number. The ? means zero or one occurrence
- Quantifier for \d
- Escaped the dot ., otherwise it could be any char
- Grouped longitude and latitude
Use regex101 to check it.
Instead of looking up the whole data in memory, you could use an iterative solution: Line by line
You could return a dict, or a list for each result.
import re def parse(file, regex, *, to_float=False): with open(file) as fd: for line in fd: match = regex.search(line) if match: lon, lat = match.group(1), match.group(2) if to_float: lon, lat = float(lon), float(lat) yield {'lon': lon, 'lat': lat} filename = "bugs_bunny2.txt" pattern = re.compile(r"{LON=([+-]?\d{2}\.\d{6})}{LAT=([+-]?\d{2}\.\d{6})}") for data in parse(filename, pattern): print(data)Since Python 3.8, you could write one line lesser (assignment expression):
import re def parse(file, regex, *, to_float=False): with open(file) as fd: for line in fd: if match := regex.search(line): lon, lat = match.group(1), match.group(2) if to_float: lon, lat = float(lon), float(lat) yield {'lon': lon, 'lat': lat} filename = "bugs_bunny2.txt" pattern = re.compile(r"{LON=([+-]?\d{2}\.\d{6})}{LAT=([+-]?\d{2}\.\d{6})}") for data in parse(filename, pattern): print(data)
Output:{'lon': '-78.555550', 'lat': '39.111222'}
{'lon': '-78.555551', 'lat': '39.111223'}
{'lon': '-78.456432', 'lat': '38.999999'}
{'lon': '-78.555593', 'lat': '39.111199'}
{'lon': '-78.555594', 'lat': '39.111190'}
{'lon': '-78.555565', 'lat': '39.111191'}
{'lon': '-78.555516', 'lat': '38.111065'}
I tried this also with one and zero lines and it works as expected.This output is without converting the str to float.
If the file is 100 TiB big, you're still able to use this code,
because it doesn't load the whole content of the file into memory.
A nice side effect of an iterative solution.
The use of
re.findall
requires the whole content to be in memory.For toy applications, it's ok.
With medium data (fits on disk, but not memory) you need an iterative solution.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!