Sep-09-2017, 10:59 PM
(This post was last modified: Sep-10-2017, 01:13 PM by Prince_Bhatia.)
i am trying to scrape image and table from a wikipedia page and write it into csv but i am confused that how to club them together and write this data into csv.
below are my codes
below are my codes
from urllib.request import urlopen from bs4 import BeautifulSoup url = "https://en.wikipedia.org/wiki/Kevin_Bacon" html = urlopen(url) soup = BeautifulSoup(html, "html.parser") newfile = "Newlyout.csv" f = open(newfile, "w") Headers = "Year, Association, Category, Nominated, Results, Imagelink\n" f.write(Headers) soup1 = soup.find_all("img") for i in soup1: Image = i['src'] #ddprint(Image['src']) soup3 = soup.find("table", {"class":"wikitable sortable"}) for tag in soup3.find_all("tr"): cell = tag.find_all("td") if len(cell) == 5: Year = cell[0].find(text=True) Association = cell[2].find(text=True) Category = cell[3].find(text=True) Nominated = cell[4].find(text=True) Results = cell[4].find(text=True) f.write("{}".format(Year)+ ",{}".format(Association)+ ",{}".format(Category) + ",{}".format(Nominated) + ",{}".format(Results)+ ",{}".format(Image)+"\n") f.close()i got it solved till here but it is repeating the data..and in images there are multiple images in one single cell....all i need table and against it all images in that page..