Jan-24-2020, 01:03 PM

I am using a for loop with regular expressions to scrape a file
for ordered pairs of geographic co-ordinates. All seemed to be
working until I noticed on a file with a single entry that the
scrape did not work. Closer inspection then revealed when multi-
ples were found, every other entry was being skipped.
Pairing everything back to minimal code revealed the below was
failing to capture alternating entries in the scraped file.
import re filename = "bugs_bunny2.txt" f = open(filename) for x in f: text_string = f.readline() scraped_data = re.findall("{LON=-\d\d.\d\d\d\d\d\d}{LAT=\d\d.\d\d\d\d\d\d}", text_string) print(scraped_data)The data in the bugs_bunny2.txt file is:
Source = Goniometer, Accuracy < 1 % {LON=-78.555550}{LAT=39.111222} Source = Goniometer, Accuracy < 7 % {LON=-78.555551}{LAT=39.111223} Source = Altimeter, Accuracy <15 % {LON=-78.456432}{LAT=38.999999} Source = GPS, Accuracy < .1% {LON=-78.555593}{LAT=39.111199} Source = Goniometer Accuracy < 2 % {LON=-78.555594}{LAT=39.111190} GPS CLOCK CORRECTED Source = Goniometer Accuracy < 1 % {LON=-78.555565}{LAT=39.111191} GPS CLOCK CORRECTED Source = Goniometer Accuracy < .9% {LON=-78.555516}{LAT=38.111065} GPS CLOCK CORRECTEDThe above file is a structured sample of what the file is like
being scraped. The longitude entries are deliberately forged so
that the last digit of the longitude entry increments by one
starting at 0 and ending at 6 for a total of seven possible
entries that can be recovered.
When the script is run only entries with zero or even numbers
at the end of the longitude entry are returned. On an outside
chance the last digit was influencing the result the last digit
in the longitude entries was changed by '1' so the numbers ran
from 1 through 7 instead of 0 through 6. The same entries were
displayed in the test with the last digit altered. Lines
0,2,4 & 6 were displayed.
{LON=-78.555550}{LAT=39.111222} {LON=-78.456432}{LAT=38.999999} {LON=-78.555594}{LAT=39.111190} {LON=-78.555516}{LAT=38.111065} Altered Entries {LON=-78.555551}{LAT=39.111222} {LON=-78.456433}{LAT=38.999999} {LON=-78.555595}{LAT=39.111190} {LON=-78.555517}{LAT=38.111065}If the file being scraped has only one matching entry to the regular
expression then it is not displayed. Checking the output of the
variable scraped_data will return an empty set of brackets, "[]".
What am I overlooking here. I see nothing that should step over
any line in the text file. At this point even straws are welcomed.
Mel