Python Forum

Full Version: Need help scraping wikipedia table
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey guys,

I am fairly new to python and how to use it. I have been attempting to use Beautiful soup to scrape a wikipedia table, https://en.wikipedia.org/wiki/List_of_ne...in_Chicago, and am having a lot of difficulty in doing so. I keep getting an empty df with just column headers. Can someone please help walk through the code that is necessary to scrape this table with me. I would greatly appreciate it. Again, don't need someone to do it for me, but would like someone to talk me through it.



Here is what I've tried so far

Thanks! Hoping this site works!
Please do not post links to code.
Post the code within the thread, using bbcode tags
Pandas can scape tables directly to Dataframe,so don't need BS for this.
Example
Sorry about that, still learning. Here is the code:

from bs4 import BeautifulSoup
import numpy as np 
import requests
import pandas as pd 

list_url = "https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago"
source = requests.get(list_url)

soup = BeautifulSoup(source.text, 'html.parser')

neighborhood_table=soup.find('table')

df=pd.read_html(str(neighborhood_table))

df.head()
[error]---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-c42a15b2c7cf> in <module>
----> 1 df.head()

AttributeError: 'list' object has no attribute 'head'[/error]
(Dec-01-2020, 05:21 PM)Larz60+ Wrote: [ -> ]Please do not post links to code.
Post the code within the thread, using bbcode tags

I followed those instructions, did that work better?
(Dec-01-2020, 05:27 PM)snippsat Wrote: [ -> ]Pandas can scape tables directly to Dataframe,so don't need BS for this.
Example

I clicked on the link and was told I need authorization, can you please recommend next steps. Thank you.
Try now link.
import pandas as pd

df = pd.read_html("https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago")
df = df[0]
print(df.head())
Output:
Neighborhood Community area 0 Albany Park Albany Park 1 Altgeld Gardens Riverdale 2 Andersonville Edgewater 3 Archer Heights Archer Heights 4 Armour Square Armour Square