Working with data from the forum:
from collections import Counter
from urllib.request import urlopen, Request
values = list(
map(
int,
urlopen(
Request(
"https://python-forum.io/attachment.php?aid=2273",
headers={
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0"
},
)
)
.read()
.split(),
)
)
top_5 = Counter(values).most_common(5)
Try it online
The solution for crap_data.txt could be:
from itertools import chain
with open("crap_data.txt") as fd:
values = list(map(int, chain.from_iterable(line.split() for line in fd)))
chain.from_iterable
chains
iterables
together and returns is an
iterator
.
chain.from_iterable
is called with the generator expression:
line.split() for line in fd
Iterating over an open file, yields lines (line-seperator included).
The
map
function calls for each element from
chain.from_iterable
the function
int
.
The
map
function returns an
iterator
and must be consumed, e.g. with
list
.
This style does not allow exception handling inside the hidden loops.
If you expect even crappier data, then the naive approach is better.
values = [] # <- we want the int's here
with open("crap_data.txt") as fd:
for line in fd:
for word in line.split():
try:
value = int(word)
except ValueError:
continue
values.append(value)
Or
this example.
crap_data.txt (white space included additionally)
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397
202 137 390 235 114 369 198 110 350 396 390 383 225 258 38 291 75 324 401 142 288 397