Python Forum
Reading an ASCII text file and parsing data...
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reading an ASCII text file and parsing data...
#1
I have attached a sample Scores file from a HR Derby app (qaz.txt). it is an ASCII text file.
I am attempting to write a python program to read this data file (by prompting the user for filename),
and tallying a W-L, Pct., record for both players, and what the longest winning
streak was for each player. The output should look like these next 2 lines:
Mincher 49-50 .495 Longest winning streak 10
Prescott 50-49 .505 Longest winning streak 7

(Player names will vary with each new file read)
The number of games played is listed on line 2 of
the data file (in this case 99), and any data past the line that reads "SERIES STATS"
is to be ignored. I have attached the data file (qaz.txt) and the python code
that I cannot get to work (derby_read.py)

Any HELP would be appreciated

Attached Files

.py   derby_read.py (Size: 1.89 KB / Downloads: 10)
.txt   qaz.txt (Size: 35.87 KB / Downloads: 11)
Reply
#2
What do you want as a result? Just the win/loss/longest streak info for each participant?

A start:
from collections import defaultdict


players = defaultdict(list)

results = []
with open("data.csv", "r") as file:
    for line in file:
        if line.startswith("Game #"):
            next(file)  # Skip blank
            next(file)  # Skip ---
            year, p1, *p1_results = next(file).split()
            year, p2, *p2_results = next(file).split()

            p1_results = int(p1_results[-1])
            p2_results = int(p2_results[-1])

            players[p1].append(1 if p1_results > p2_results else 0)
            players[p2].append(1 if p2_results > p1_results else 0)
Another way (more robust way) to get the player results is do some pattern matching. This code assume all lines containing game results starts with a 4 digit year. It tolerates additional lines between the "Game #" line and the first player.
from collections import defaultdict
from itertools import islice
import re


# Added to itertools in python 3.12
def batched(iterable, n):
    """batched('ABCDEFG', 3) → ABC DEF G"""
    if n < 1:
        raise ValueError("n must be at least one")
    iterator = iter(iterable)
    while batch := tuple(islice(iterator, n)):
        yield batch


players = defaultdict(list)
lines = []

with open("data.csv", "r") as file:
    for line in file:
        if re.match(r"\d{4} ", line):
            lines.append(line.split())

for p1, p2 in batched(lines, n=2):
    p1_total = int(p1[-1])
    p2_total = int(p2[-1])
    players[p1[1]].append(int(p1_total > p2_total))
    players[p2[1]].append(int(p2_total > p1_total))
Now you have a list of game results for each player, with 1 when they win and a 0 when they lose. Sum the game results to get the number of wins. The length of the game results is how many games are played. The trickiest part remaining is computing the longest winning streak.

Neither code above handles player names that contain a space. It also doesn't handle different players having the same name. I was looking at a list of MLB players and saw there have been 10 Abreus, three playing right now. If Jose Abreu was playing Bryan Abreu, the players dictionary would only have one Abreu who played a lot of games and had a 500 average.
Reply
#3
(Jun-07-2024, 08:06 PM)deanhystad Wrote: What do you want as a result? Just the win/loss/longest streak info for each participant?

A start:
from collections import defaultdict


players = defaultdict(list)

results = []
with open("data.csv", "r") as file:
    for line in file:
        if line.startswith("Game #"):
            next(file)  # Skip blank
            next(file)  # Skip ---
            year, p1, *p1_results = next(file).split()
            year, p2, *p2_results = next(file).split()

            p1_results = int(p1_results[-1])
            p2_results = int(p2_results[-1])

            players[p1].append(1 if p1_results > p2_results else 0)
            players[p2].append(1 if p2_results > p1_results else 0)
Another way (more robust way) to get the player results is do some pattern matching. This code assume all lines containing game results starts with a 4 digit year. It tolerates additional lines between the "Game #" line and the first player.
from collections import defaultdict
from itertools import islice
import re


# Added to itertools in python 3.12
def batched(iterable, n):
    """batched('ABCDEFG', 3) → ABC DEF G"""
    if n < 1:
        raise ValueError("n must be at least one")
    iterator = iter(iterable)
    while batch := tuple(islice(iterator, n)):
        yield batch


players = defaultdict(list)
lines = []

with open("data.csv", "r") as file:
    for line in file:
        if re.match(r"\d{4} ", line):
            lines.append(line.split())

for p1, p2 in batched(lines, n=2):
    p1_total = int(p1[-1])
    p2_total = int(p2[-1])
    players[p1[1]].append(int(p1_total > p2_total))
    players[p2[1]].append(int(p2_total > p1_total))
Now you have a list of game results for each player, with 1 when they win and a 0 when they lose. Sum the game results to get the number of wins. The length of the game results is how many games are played. The trickiest part remaining is computing the longest winning streak.

Neither code above handles player names that contain a space. It also doesn't handle different players having the same name. I was looking at a list of MLB players and saw there have been 10 Abreus, three playing right now. If Jose Abreu was playing Bryan Abreu, the players dictionary would only have one Abreu who played a lot of games and had a 500 average.

The output should look like these next 2 lines:
Mincher 49-50 .495 Longest winning streak 10
Prescott 50-49 .505 Longest winning streak 7
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Text parsing Arik 5 562 Mar-11-2024, 03:30 PM
Last Post: Gribouillis
Sad problems with reading csv file. MassiJames 3 823 Nov-16-2023, 03:41 PM
Last Post: snippsat
  Reading a file name fron a folder on my desktop Fiona 4 1,097 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
Video doing data treatment on a file import-parsing a variable EmBeck87 15 3,230 Apr-17-2023, 06:54 PM
Last Post: EmBeck87
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,241 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Reading a file JonWayn 3 1,212 Dec-30-2022, 10:18 AM
Last Post: ibreeden
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,243 Dec-15-2022, 04:32 PM
Last Post: Larz60+
  Reading Specific Rows In a CSV File finndude 3 1,081 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 979 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Reading All The RAW Data Inside a PDF NBAComputerMan 4 1,502 Nov-30-2022, 10:54 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020