Python Forum

Full Version: Scrape Multiple items from a webpage
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i am trying to scrape image and table from a wikipedia page and write it into csv but i am confused that how to club them together and write this data into csv.

below are my codes
from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/Kevin_Bacon"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

newfile = "Newlyout.csv"
f = open(newfile, "w")
Headers = "Year, Association, Category, Nominated, Results, Imagelink\n"
f.write(Headers)

soup1 = soup.find_all("img")
for i in soup1:
    Image = i['src']
    
    #ddprint(Image['src'])
    soup3 = soup.find("table", {"class":"wikitable sortable"})
    for tag in soup3.find_all("tr"):
        cell = tag.find_all("td")
        
        if len(cell) == 5:
            Year = cell[0].find(text=True)
            Association = cell[2].find(text=True)
            Category = cell[3].find(text=True)
            Nominated = cell[4].find(text=True)
            Results = cell[4].find(text=True)
            f.write("{}".format(Year)+ ",{}".format(Association)+ ",{}".format(Category) + ",{}".format(Nominated) + ",{}".format(Results)+ ",{}".format(Image)+"\n")
            
f.close()
i got it solved till here but it is repeating the data..and in images there are multiple images in one single cell....all i need table and against it all images in that page..
Quote:
soup1 = soup.find_all("img")
for i in soup1:
    Image = i['src']
     
    #ddprint(Image['src'])
    soup3 = soup.find("table", {"class":"wikitable sortable"})

So for every image on the page... find all the tables with a certain class, and then do more stuff.
Those sound like two different things.
Problem is that i am not able to club them them together into csv file...but it is repeating the data..and in images there are multiple images in one single cell....all i need table and against it all images in that page..