May-10-2018, 09:50 PM
(May-10-2018, 06:44 PM)snippsat Wrote: Use a parser eg BeautifulSoup/lxml, Web-Scraping part-1
Example:
from bs4 import BeautifulSoup html = '''\ <umpires> <umpire first="Mark" id="427533" last="Wegner" name="Mark Wegner" position="home"></umpire> <umpire first="Paul" id="427361" last="Nauert" name="Paul Nauert" position="first"></umpire> <umpire first="Gerry" id="427103" last="Davis" name="Gerry Davis" position="second"></umpire> <umpire first="Laz" id="427113" last="Diaz" name="Laz Diaz" position="third"></umpire> <umpire first="Bill" id="427344" last="Miller" name="Bill Miller" position="left"></umpire> <umpire first="Dan" id="427248" last="Iassogna" name="Dan Iassogna" position="right"></umpire> </umpires>''' soup = BeautifulSoup(html, 'lxml')Use:
>>> soup.find('umpire') <umpire first="Mark" id="427533" last="Wegner" name="Mark Wegner" position="home"></umpire> >>> soup.find('umpire', position="second") <umpire first="Gerry" id="427103" last="Davis" name="Gerry Davis" position="second"></umpire> >>> soup.find('umpire', position="second").get('id') '427103' >>> [i.get('position') for i in soup.find_all('umpire')] ['home', 'first', 'second', 'third', 'left', 'right'] >>> [i.get('id') for i in soup.find_all('umpire')] ['427533', '427361', '427103', '427113', '427344', '427248'] >>> # Last a little more advance name and position in a dictionary >>> dict(zip([i.get('name') for i in soup.find_all('umpire')], [i.get('position') for i in soup.find_all('umpire')])) {'Bill Miller': 'left', 'Dan Iassogna': 'right', 'Gerry Davis': 'second', 'Laz Diaz': 'third', 'Mark Wegner': 'home', 'Paul Nauert': 'first'}
Works great, thank you! I had tried using soup on my own but now realize I was approaching it completely wrong.