Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help scraping wikipedia table
#1
Hey guys,

I am fairly new to python and how to use it. I have been attempting to use Beautiful soup to scrape a wikipedia table, https://en.wikipedia.org/wiki/List_of_ne...in_Chicago, and am having a lot of difficulty in doing so. I keep getting an empty df with just column headers. Can someone please help walk through the code that is necessary to scrape this table with me. I would greatly appreciate it. Again, don't need someone to do it for me, but would like someone to talk me through it.



Here is what I've tried so far

Thanks! Hoping this site works!
Reply
#2
Please do not post links to code.
Post the code within the thread, using bbcode tags
Reply
#3
Pandas can scape tables directly to Dataframe,so don't need BS for this.
Example
Reply
#4
Sorry about that, still learning. Here is the code:

from bs4 import BeautifulSoup
import numpy as np 
import requests
import pandas as pd 

list_url = "https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago"
source = requests.get(list_url)

soup = BeautifulSoup(source.text, 'html.parser')

neighborhood_table=soup.find('table')

df=pd.read_html(str(neighborhood_table))

df.head()
[error]---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-c42a15b2c7cf> in <module>
----> 1 df.head()

AttributeError: 'list' object has no attribute 'head'[/error]
Reply
#5
(Dec-01-2020, 05:21 PM)Larz60+ Wrote: Please do not post links to code.
Post the code within the thread, using bbcode tags

I followed those instructions, did that work better?
Larz60+ likes this post
Reply
#6
(Dec-01-2020, 05:27 PM)snippsat Wrote: Pandas can scape tables directly to Dataframe,so don't need BS for this.
Example

I clicked on the link and was told I need authorization, can you please recommend next steps. Thank you.
Reply
#7
Try now link.
import pandas as pd

df = pd.read_html("https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago")
df = df[0]
print(df.head())
Output:
Neighborhood Community area 0 Albany Park Albany Park 1 Altgeld Gardens Riverdale 2 Andersonville Edgewater 3 Archer Heights Archer Heights 4 Armour Square Armour Square
Larz60+ likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help Scraping links and table from link cartonics 11 1,560 Oct-12-2023, 06:42 AM
Last Post: cartonics
Question Scraping Wikipedia Article (Name in 1 column & URL in 2nd column) ->CSV! Anyone? BrandonKastning 4 2,007 Jan-27-2022, 04:36 AM
Last Post: Larz60+
  Scraping data from table into existing dataframe vincer58 1 2,008 Jan-09-2022, 05:15 PM
Last Post: vincer58
  fetching, parsing data from Wikipedia apollo 2 3,538 May-06-2021, 08:08 PM
Last Post: snippsat
  Web Scraping Inquiry (Extracting content from a table in asubdomain) DustinKlent 3 3,709 Aug-17-2020, 10:10 AM
Last Post: snippsat
  Scraping a dynamic data-table in python through AJAX request filozofo 1 3,882 Aug-14-2020, 10:13 AM
Last Post: kashcode
  scraping multiple pages from table bandar 1 2,685 Jun-27-2020, 10:43 PM
Last Post: Larz60+
  table from wikipedia flow50 5 5,424 Jul-01-2019, 07:12 PM
Last Post: snippsat
  Web scraping "fancy" table acehole60 2 4,906 Dec-16-2016, 09:17 AM
Last Post: acehole60

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020