Python Forum
"for loop" not indexing correctly?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
"for loop" not indexing correctly?
#3
The problem is strange. I haven't reproduced it, instead I go a different way.

First of all, your regex is wrong.
{LON=-\d\d.\d\d\d\d\d\d}{LAT=\d\d.\d\d\d\d\d\d}
The corrected version:
{LON=([+-]?\d{2}\.\d{6})}{LAT=([+-]?\d{2}\.\d{6})}

  1. + or - or no sign in front of the number. The ? means zero or one occurrence
  2. Quantifier for \d
  3. Escaped the dot ., otherwise it could be any char
  4. Grouped longitude and latitude

Use regex101 to check it.


Instead of looking up the whole data in memory, you could use an iterative solution: Line by line
You could return a dict, or a list for each result.


import re


def parse(file, regex, *, to_float=False):
    with open(file) as fd: 
        for line in fd:
            match = regex.search(line)
            if match:
                lon, lat = match.group(1), match.group(2)
                if to_float:
                    lon, lat = float(lon), float(lat)
                yield {'lon': lon, 'lat': lat}


filename = "bugs_bunny2.txt"
pattern = re.compile(r"{LON=([+-]?\d{2}\.\d{6})}{LAT=([+-]?\d{2}\.\d{6})}")
for data in parse(filename, pattern):
    print(data)
Since Python 3.8, you could write one line lesser (assignment expression):

import re


def parse(file, regex, *, to_float=False):
    with open(file) as fd: 
        for line in fd:
            if match := regex.search(line):
                lon, lat = match.group(1), match.group(2)
                if to_float:
                    lon, lat = float(lon), float(lat)
                yield {'lon': lon, 'lat': lat}


filename = "bugs_bunny2.txt"
pattern = re.compile(r"{LON=([+-]?\d{2}\.\d{6})}{LAT=([+-]?\d{2}\.\d{6})}")
for data in parse(filename, pattern):
    print(data)
Output:
{'lon': '-78.555550', 'lat': '39.111222'} {'lon': '-78.555551', 'lat': '39.111223'} {'lon': '-78.456432', 'lat': '38.999999'} {'lon': '-78.555593', 'lat': '39.111199'} {'lon': '-78.555594', 'lat': '39.111190'} {'lon': '-78.555565', 'lat': '39.111191'} {'lon': '-78.555516', 'lat': '38.111065'}
I tried this also with one and zero lines and it works as expected.
This output is without converting the str to float.


If the file is 100 TiB big, you're still able to use this code,
because it doesn't load the whole content of the file into memory.
A nice side effect of an iterative solution.

The use of re.findall requires the whole content to be in memory.
For toy applications, it's ok.

With medium data (fits on disk, but not memory) you need an iterative solution.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Messages In This Thread
"for loop" not indexing correctly? - by melblanc - Jan-24-2020, 01:03 PM
RE: "for loop" not indexing correctly? - by DeaD_EyE - Jan-24-2020, 02:31 PM
RE: "for loop" not indexing correctly? - by buran - Jan-24-2020, 02:35 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Matrix indexing and initialization in " for in" loop QuintenR 2 1,897 Dec-23-2020, 05:59 PM
Last Post: QuintenR
  Nested loop indexing Morte 4 4,025 Aug-04-2020, 07:24 AM
Last Post: Morte
  How to change 0 based indexing to 1 based indexing in python..?? Ruthra 2 4,557 Jan-22-2020, 05:13 PM
Last Post: Ruthra
  Why doesn't my loop work correctly? (problem with a break statement) steckinreinhart619 2 3,258 Jun-11-2019, 10:02 AM
Last Post: steckinreinhart619

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020