Python Forum
Scraping Columns with Pandas (Column Entries w/ more than 1 word writes two columns)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping Columns with Pandas (Column Entries w/ more than 1 word writes two columns)
#6
snippsat,

Thank you for this! This is a great point for me to start regarding images and I believe since you wrote the comment "First Image" that I will need to learn loops. I will be coming back to this.

In the meantime, I ran into some troubles with wikipedia panda table scraping. I changed from Counties to "Municipalities" and regardless of the tables[0], tables[1], tables[2] result in all the wrong data displayed on the wikipedia article.

Code is as follows:

import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_municipalities_in_Alabama"
tables = pd.read_html(url) 

df = tables[1]

df.to_excel("AL_Alabama_Municipalities.ods", index=False, engine="odf")
Thank you again for this forum! How do I determine the tables[#]? Is it a guessing game or is is there an attribute or property within the browser code that could aid me in finding the correct tables[#]?

Best Regards,

Brandon Kastning

(Jan-10-2022, 06:52 PM)snippsat Wrote:
(Jan-09-2022, 09:52 PM)BrandonKastning Wrote: Should I open a new thread?
It's part of same task,so no problem.
(Jan-09-2022, 09:52 PM)BrandonKastning Wrote: How to download the map images and store them (Either DB or Local) ?
You have to give it try,now need to use more common scraping tool.
Here a demo how to start.
import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_counties_in_Alabama"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('h1').text)

# First image
img = soup.find_all('a', class_="image")
img_link= img[0].find('img').get('src')
img_link = img_link.replace('//', 'http://')
print(img_link)
Output:
List of counties in Alabama http://upload.wikimedia.org/wikipedia/commons/thumb/5/54/Map_of_Alabama_highlighting_Autauga_County.svg/75px-Map_of_Alabama_highlighting_Autauga_County.svg.png
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply


Messages In This Thread
RE: Scraping Columns with Pandas (Column Entries w/ more than 1 word writes two columns) - by BrandonKastning - Jan-13-2022, 04:56 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question Scraping Wikipedia Article (Name in 1 column & URL in 2nd column) ->CSV! Anyone? BrandonKastning 4 2,059 Jan-27-2022, 04:36 AM
Last Post: Larz60+
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? BrandonKastning 21 7,071 Mar-23-2020, 05:51 PM
Last Post: ndc85430
  Display blog posts in two columns saladgg 3 3,397 Dec-28-2018, 05:17 AM
Last Post: saladgg

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020