Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looping with Beautifulsoup
#1
Hi!

This is my very first python script, I will probably make some noob mistakes and I apologise for that.
So I have a txt file that is full of URLS, they are all in new lines. I created a scrips which basically goes onto one of these sites and gets 7 things we need and it put's it into an excel file. It works perfectly if I use for example (and remove the looping):

my_url = 'https://www.londonstockexchange.com/exchange/prices-and-markets/stocks/summary/company-summary/GB00BH4HKS39GBGBXSET1.html?lang=en'
However if I want to create a loop for it, which means going into the "input.txt" file and reading a line, webscraping it and putting it into excel and repeating it until it reaches the end of the list there is just simply nothing that gets created.

Where does the loop go wrong?

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

#counts the number of lines in input.txt (how many times the loop has to run)
filename = "input.txt"
myfile= open(filename)
linescounted = len(myfile.readlines())
numberoflines = linescounted + 1

#creates the output excel file
filename = "products.csv"
f= open(filename, "w")
	headers = "name, special_cond, sector, subsector, index, marketcond, isin\n"
f.write(headers)

urlcount = 0
while urlcount < numberoflines:
	my_url = myfile.readline()

#my_url = 'https://www.londonstockexchange.com/exchange/prices-and-markets/stocks/summary/company-summary/GB00BH4HKS39GBGBXSET1.html?lang=en'
# Opens up the connection and grabs the page
	uClient = uReq(my_url)
# Offloads content into a variable
	page_html = uClient.read()
# Closes the client
	uClient.close()
# html parsing
	page_soup = soup(page_html, "html.parser")

	
	name = page_soup.find('h1', attrs={'class','tesummary'}).text.replace("\n", "") 
	spec = page_soup.find('div', attrs={'class':'commonTable table-responsive'}).find('tr', attrs={'class':'even'}).find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').text.replace("\r", "")
	sect = page_soup.find('div', attrs={'id':'pi-colonna2'}).find('div', attrs={'class':'table-responsive'}).find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').text
	subsect = page_soup.find('div', attrs={'id':'pi-colonna2'}).find('div', attrs={'class':'table-responsive'}).find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').text
	index = page_soup.find('div', attrs={'id':'pi-colonna2'}).find('div', attrs={'class':'table-responsive'}).find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').text
	mainmarket = page_soup.find('div', attrs={'id':'pi-colonna2'}).find('div', attrs={'class':'table-responsive'}).find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').text
	isin = page_soup.find('div', attrs={'id':'pi-colonna2'}).find('div', attrs={'class':'table-responsive'}).find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').find_next('td').text


	f.write(name.replace("\r", " ") + "," + spec.replace("\n", "") + "," + sect + "," + subsect + "," + index.replace(",", "|") + "," + mainmarket + "," + isin + "\n"
	
	
	numberoflines = numberoflines + 1


f.close()

	
Reply


Messages In This Thread
Looping with Beautifulsoup - by CaptainCsaba - Jan-22-2019, 11:55 AM
RE: Looping with Beautifulsoup - by ichabod801 - Jan-22-2019, 04:20 PM
RE: Looping with Beautifulsoup - by CaptainCsaba - Jan-23-2019, 07:17 AM
RE: Looping with Beautifulsoup - by buran - Jan-23-2019, 09:41 AM
RE: Looping with Beautifulsoup - by CaptainCsaba - Jan-23-2019, 10:49 AM
RE: Looping with Beautifulsoup - by snippsat - Jan-23-2019, 11:18 AM
RE: Looping with Beautifulsoup - by buran - Jan-23-2019, 11:20 AM
RE: Looping with Beautifulsoup - by CaptainCsaba - Jan-23-2019, 11:30 AM
RE: Looping with Beautifulsoup - by buran - Jan-23-2019, 12:38 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020