Reading an ASCII text file and parsing data...

oradba4u · (This post was last modified: Jun-07-2024, 05:01 PM by oradba4u.)

I have attached a sample Scores file from a HR Derby app (qaz.txt). it is an ASCII text file.
I am attempting to write a python program to read this data file (by prompting the user for filename),
and tallying a W-L, Pct., record for both players, and what the longest winning
streak was for each player. The output should look like these next 2 lines:
Mincher 49-50 .495 Longest winning streak 10
Prescott 50-49 .505 Longest winning streak 7

(Player names will vary with each new file read)
The number of games played is listed on line 2 of
the data file (in this case 99), and any data past the line that reads "SERIES STATS"
is to be ignored. I have attached the data file (qaz.txt) and the python code
that I cannot get to work (derby_read.py)

Any HELP would be appreciated

**deanhystad** · (This post was last modified: Jun-07-2024, 08:06 PM by deanhystad.)

What do you want as a result? Just the win/loss/longest streak info for each participant?

A start:

from collections import defaultdict


players = defaultdict(list)

results = []
with open("data.csv", "r") as file:
    for line in file:
        if line.startswith("Game #"):
            next(file)  # Skip blank
            next(file)  # Skip ---
            year, p1, *p1_results = next(file).split()
            year, p2, *p2_results = next(file).split()

            p1_results = int(p1_results[-1])
            p2_results = int(p2_results[-1])

            players[p1].append(1 if p1_results > p2_results else 0)
            players[p2].append(1 if p2_results > p1_results else 0)

Another way (more robust way) to get the player results is do some pattern matching. This code assume all lines containing game results starts with a 4 digit year. It tolerates additional lines between the "Game #" line and the first player.

from collections import defaultdict
from itertools import islice
import re


# Added to itertools in python 3.12
def batched(iterable, n):
    """batched('ABCDEFG', 3) → ABC DEF G"""
    if n < 1:
        raise ValueError("n must be at least one")
    iterator = iter(iterable)
    while batch := tuple(islice(iterator, n)):
        yield batch


players = defaultdict(list)
lines = []

with open("data.csv", "r") as file:
    for line in file:
        if re.match(r"\d{4} ", line):
            lines.append(line.split())

for p1, p2 in batched(lines, n=2):
    p1_total = int(p1[-1])
    p2_total = int(p2[-1])
    players[p1[1]].append(int(p1_total > p2_total))
    players[p2[1]].append(int(p2_total > p1_total))

Now you have a list of game results for each player, with 1 when they win and a 0 when they lose. Sum the game results to get the number of wins. The length of the game results is how many games are played. The trickiest part remaining is computing the longest winning streak.

Neither code above handles player names that contain a space. It also doesn't handle different players having the same name. I was looking at a list of MLB players and saw there have been 10 Abreus, three playing right now. If Jose Abreu was playing Bryan Abreu, the players dictionary would only have one Abreu who played a lot of games and had a 500 average.

oradba4u · Jun-08-2024, 12:41 AM

(Jun-07-2024, 08:06 PM)deanhystad Wrote: What do you want as a result? Just the win/loss/longest streak info for each participant?

A start:
from collections import defaultdict


players = defaultdict(list)

results = []
with open("data.csv", "r") as file:
    for line in file:
        if line.startswith("Game #"):
            next(file)  # Skip blank
            next(file)  # Skip ---
            year, p1, *p1_results = next(file).split()
            year, p2, *p2_results = next(file).split()

            p1_results = int(p1_results[-1])
            p2_results = int(p2_results[-1])

            players[p1].append(1 if p1_results > p2_results else 0)
            players[p2].append(1 if p2_results > p1_results else 0)
Another way (more robust way) to get the player results is do some pattern matching. This code assume all lines containing game results starts with a 4 digit year. It tolerates additional lines between the "Game #" line and the first player.
from collections import defaultdict
from itertools import islice
import re


# Added to itertools in python 3.12
def batched(iterable, n):
    """batched('ABCDEFG', 3) → ABC DEF G"""
    if n < 1:
        raise ValueError("n must be at least one")
    iterator = iter(iterable)
    while batch := tuple(islice(iterator, n)):
        yield batch


players = defaultdict(list)
lines = []

with open("data.csv", "r") as file:
    for line in file:
        if re.match(r"\d{4} ", line):
            lines.append(line.split())

for p1, p2 in batched(lines, n=2):
    p1_total = int(p1[-1])
    p2_total = int(p2[-1])
    players[p1[1]].append(int(p1_total > p2_total))
    players[p2[1]].append(int(p2_total > p1_total))
Now you have a list of game results for each player, with 1 when they win and a 0 when they lose. Sum the game results to get the number of wins. The length of the game results is how many games are played. The trickiest part remaining is computing the longest winning streak.

Neither code above handles player names that contain a space. It also doesn't handle different players having the same name. I was looking at a list of MLB players and saw there have been 10 Abreus, three playing right now. If Jose Abreu was playing Bryan Abreu, the players dictionary would only have one Abreu who played a lot of games and had a 500 average.

The output should look like these next 2 lines:
Mincher 49-50 .495 Longest winning streak 10
Prescott 50-49 .505 Longest winning streak 7

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	parsing a tree of text first the right most aligned blocks of text and so on	arvindikchari	2	790	Nov-21-2024, 01:42 AM Last Post: BashBedlam
	[solved] how to delete the 10 first lines of an ascii file	paul18fr	7	1,811	Aug-07-2024, 08:18 PM Last Post: Gribouillis
	Text parsing	Arik	5	1,683	Mar-11-2024, 03:30 PM Last Post: Gribouillis
	problems with reading csv file.	MassiJames	3	2,576	Nov-16-2023, 03:41 PM Last Post: snippsat
	Reading a file name fron a folder on my desktop	Fiona	4	2,107	Aug-23-2023, 11:11 AM Last Post: Axel_Erfurt
	doing data treatment on a file import-parsing a variable	EmBeck87	15	5,689	Apr-17-2023, 06:54 PM Last Post: EmBeck87
	Reading data from excel file –> process it >>then write to another excel output file	Jennifer_Jone	0	2,106	Mar-14-2023, 07:59 PM Last Post: Jennifer_Jone
	Reading a file	JonWayn	3	1,954	Dec-30-2022, 10:18 AM Last Post: ibreeden
	Need to compare the Excel file name with a directory text file.	veeran1991	1	2,027	Dec-15-2022, 04:32 PM Last Post: Larz60+
	Reading Specific Rows In a CSV File	finndude	3	1,855	Dec-13-2022, 03:19 PM Last Post: finndude

Reading an ASCII text file and parsing data...

User Panel Messages

Announcements