(Oct-10-2023, 01:32 PM)cartonics Wrote: A stupid question... why if in the source code in the link there isYes,and the reason is your code 😉
serie-a"e
scraping become
=serie-a"e
is it a problem of encoding ??
Remove the encoding stuff you start with and use lxml as parser,then the links will work.
from bs4 import BeautifulSoup from bs4.dammit import EncodingDetector import requests parser = 'lxml' # or 'lxml' (preferred) or 'html5lib', if installed resp = requests.get("https://www.sbostats.com/soccer/league/italy/serie-a") soup = BeautifulSoup(resp.content, parser) table = soup.find_all('table', attrs={'class':'updated_next_results_table'}) table = table[0] tr = table.find_all('tr') base_url = '*https://www.sbostats.com' with open('matches.txt', 'a') as fp: for row in tr: if row.text == None: pass if row.find('a') == None: pass else: #print(' '.join(row.text.replace('STATS', '-').split()[:3])) #print(f"{base_url}{row.find('a')['href']}\n") fp.write(f"{' '.join(row.text.replace('STATS', '-').split()[:3])}\n") fp.write(f"{base_url}{row.find('a')['href']}\n\n")
Output:Verona - Napoli
*https://www.sbostats.com/soccer/stats?country=italy&league=serie-a"e=1.50&direction=away&id=NDAxMTg3OA==
Torino - Inter
*https://www.sbostats.com/soccer/stats?country=italy&league=serie-a"e=1.83&direction=away&id=NDAxMTg3OQ==
Sassuolo - Lazio
*https://www.sbostats.com/soccer/stats?country=italy&league=serie-a"e=2.30&direction=away&id=NDAxMTg4MA==