When using
Find a tag that have that have more specific info and contain the table.
Example:
as you see no use of urllib always Requests.
soup.find()
it stop at first hit,there are 6 class="container-fluid"
.Find a tag that have that have more specific info and contain the table.
Example:
from bs4 import BeautifulSoup import requests url = 'http://www.vgchartz.com/gamedb/?page=&results=1000&name=&platform=&minSales=0.01&publisher=&genre=&sort=GL' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') chart = soup.find("div", id="generalBody") tr_tag = chart.find_all('tr')Test:
>>> tr_tag[4] <tr style="background-image:url(../imgs/chartBar_alt_large.gif); height:70px"> <td>2</td> <td> <div id="photo3"> <a href="/games/game.php?id=6455&region=All"> <div style="height:60px; width:60px; overflow:hidden;"> <img alt="Boxart Missing" border="0" src="/games/boxart/8972270ccc.jpg" width="60"/> </div> </a> </div> </td> <td style="font-size:12pt;"> <a href="http://www.vgchartz.com/game/6455/super-mario-bros/?region=All">Super Mario Bros. </a> </td> <td> <center> <img alt="NES" src="/images/consoles/NES_b.png"/> </center> </td> <td width="100">Nintendo </td> <td align="center">N/A </td> <td align="center">10.0 </td> <td align="center">N/A </td> <td align="center">40.24m</td> <td align="center" width="75">18th Oct 85 </td> <td align="center" width="75">N/A</td></tr> >>> tr_tag[4].find_all('a')[1].text ... 'Super Mario Bros. ' >>> td = tr_tag[4].find_all('td', align="center") >>> for item in td: ... item.text ... 'N/A ' '10.0 ' 'N/A ' '40.24m' '18th Oct 85 ' 'N/A'Look at Web-Scraping part-1,
as you see no use of urllib always Requests.