Python Forum

I have attached a sample Scores file from a HR Derby app (qaz.txt). it is an ASCII text file.
I am attempting to write a python program to read this data file (by prompting the user for filename),
and tallying a W-L, Pct., record for both players, and what the longest winning
streak was for each player. The output should look like these next 2 lines:
Mincher 49-50 .495 Longest winning streak 10
Prescott 50-49 .505 Longest winning streak 7

(Player names will vary with each new file read)
The number of games played is listed on line 2 of
the data file (in this case 99), and any data past the line that reads "SERIES STATS"
is to be ignored. I have attached the data file (qaz.txt) and the python code
that I cannot get to work (derby_read.py)

Any HELP would be appreciated

What do you want as a result? Just the win/loss/longest streak info for each participant?

A start:

from collections import defaultdict


players = defaultdict(list)

results = []
with open("data.csv", "r") as file:
    for line in file:
        if line.startswith("Game #"):
            next(file)  # Skip blank
            next(file)  # Skip ---
            year, p1, *p1_results = next(file).split()
            year, p2, *p2_results = next(file).split()

            p1_results = int(p1_results[-1])
            p2_results = int(p2_results[-1])

            players[p1].append(1 if p1_results > p2_results else 0)
            players[p2].append(1 if p2_results > p1_results else 0)

Another way (more robust way) to get the player results is do some pattern matching. This code assume all lines containing game results starts with a 4 digit year. It tolerates additional lines between the "Game #" line and the first player.

from collections import defaultdict
from itertools import islice
import re


# Added to itertools in python 3.12
def batched(iterable, n):
    """batched('ABCDEFG', 3) → ABC DEF G"""
    if n < 1:
        raise ValueError("n must be at least one")
    iterator = iter(iterable)
    while batch := tuple(islice(iterator, n)):
        yield batch


players = defaultdict(list)
lines = []

with open("data.csv", "r") as file:
    for line in file:
        if re.match(r"\d{4} ", line):
            lines.append(line.split())

for p1, p2 in batched(lines, n=2):
    p1_total = int(p1[-1])
    p2_total = int(p2[-1])
    players[p1[1]].append(int(p1_total > p2_total))
    players[p2[1]].append(int(p2_total > p1_total))

Now you have a list of game results for each player, with 1 when they win and a 0 when they lose. Sum the game results to get the number of wins. The length of the game results is how many games are played. The trickiest part remaining is computing the longest winning streak.

Neither code above handles player names that contain a space. It also doesn't handle different players having the same name. I was looking at a list of MLB players and saw there have been 10 Abreus, three playing right now. If Jose Abreu was playing Bryan Abreu, the players dictionary would only have one Abreu who played a lot of games and had a 500 average.

(Jun-07-2024, 08:06 PM)deanhystad Wrote: [ -> ]What do you want as a result? Just the win/loss/longest streak info for each participant?

A start:
from collections import defaultdict


players = defaultdict(list)

results = []
with open("data.csv", "r") as file:
    for line in file:
        if line.startswith("Game #"):
            next(file)  # Skip blank
            next(file)  # Skip ---
            year, p1, *p1_results = next(file).split()
            year, p2, *p2_results = next(file).split()

            p1_results = int(p1_results[-1])
            p2_results = int(p2_results[-1])

            players[p1].append(1 if p1_results > p2_results else 0)
            players[p2].append(1 if p2_results > p1_results else 0)
Another way (more robust way) to get the player results is do some pattern matching. This code assume all lines containing game results starts with a 4 digit year. It tolerates additional lines between the "Game #" line and the first player.
from collections import defaultdict
from itertools import islice
import re


# Added to itertools in python 3.12
def batched(iterable, n):
    """batched('ABCDEFG', 3) → ABC DEF G"""
    if n < 1:
        raise ValueError("n must be at least one")
    iterator = iter(iterable)
    while batch := tuple(islice(iterator, n)):
        yield batch


players = defaultdict(list)
lines = []

with open("data.csv", "r") as file:
    for line in file:
        if re.match(r"\d{4} ", line):
            lines.append(line.split())

for p1, p2 in batched(lines, n=2):
    p1_total = int(p1[-1])
    p2_total = int(p2[-1])
    players[p1[1]].append(int(p1_total > p2_total))
    players[p2[1]].append(int(p2_total > p1_total))
Now you have a list of game results for each player, with 1 when they win and a 0 when they lose. Sum the game results to get the number of wins. The length of the game results is how many games are played. The trickiest part remaining is computing the longest winning streak.

Neither code above handles player names that contain a space. It also doesn't handle different players having the same name. I was looking at a list of MLB players and saw there have been 10 Abreus, three playing right now. If Jose Abreu was playing Bryan Abreu, the players dictionary would only have one Abreu who played a lot of games and had a 500 average.

The output should look like these next 2 lines:
Mincher 49-50 .495 Longest winning streak 10
Prescott 50-49 .505 Longest winning streak 7

oradba4u

deanhystad

oradba4u