Using Excel Cell As A Variable In A Loop

knight2000 · Jul-09-2021, 07:10 AM

Hey guys,

In trying to learn more about webscraping and as such I've set myself a challenge to try and scrape data off a few pages within the same website.

Each page has the same attributes (handy for webscraping each page!) but obviously the end part to the url address for each page is different. So, I've gathered the different page URL's and exported them to a spreadsheet.

What I'm trying to do now (and failing miserably) is to tell Python to use a column in my excel file which contains each page url as the page to be scrapped. Once it grabs a page URL, then it should go through with parsing the page with BeautifulSoup, extract certain elements and export that onto another excel spreadsheet.

This will then need to loop through each url and do the same thing until it goes through all the urls on the spreadsheet.

The code I've got so far to open the spreadsheet and refer to the column is:

from bs4 import BeautifulSoup
import pandas as pd
import openpyxl
import requests 

for page in current_url:
    book = openpyxl.load_workbook("url_list.xlsx")
    sheet = book['Sheet2']
    column_name = 'Full Page Url'
    for column_cell in sheet.iter_cols(1, sheet.max_column):  # iterate column cell
        if column_cell[0].value == column_name:    # check for your column
            j = 0
            for data in column_cell[1:]:    # iterate your column
                url_component = data.value
                
            break

     page = requests.get(url_component)
     soup = BeautifulSoup(page.text, 'html.parser')
     print(soup)

I've tried print(soup) there just to check that it's referencing a url from the spreadsheet.

The result I get is:

Output:
Process finished with exit code 0

But there's no html data- so it's doesn't appear to be working.

If I run this:

from bs4 import BeautifulSoup
import pandas as pd
import openpyxl
import requests 

book = openpyxl.load_workbook("url_list.xlsx")
sheet = book['Sheet2']
    column_name = 'Full Page Url'
    for column_cell in sheet.iter_cols(1, sheet.max_column):  # iterate column cell
        if column_cell[0].value == column_name:    # check for your column
            j = 0
            for data in column_cell[1:]:    # iterate your column
                url_component = data.value
                
            break

It's correctly giving me each url (so it's reading and referencing the Excel file and column correctly). For example the code above gives:

Output:https://www.samplesite.com/360/
https://www.samplesite.com/3d-checker/

Could someone please help me understand where I'm going wrong?

Thanking you.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Variable definitions inside loop / could be better?	gugarciap	2	439	Jan-09-2024, 11:11 PM Last Post: deanhystad
	How to create a variable only for use inside the scope of a while loop?	Radical	10	1,711	Nov-07-2023, 09:49 AM Last Post: buran
	Problem with print variable in print.cell (fpdf)	muconi	0	662	Dec-25-2022, 02:24 PM Last Post: muconi
	How to loop through all excel files and sheets in folder	jadelola	1	4,490	Dec-01-2022, 06:12 PM Last Post: deanhystad
	Deleting rows based on cell value in Excel	azizrasul	11	2,634	Oct-19-2022, 02:38 AM Last Post: azizrasul
	export into excel, how to implement pandas into for-loop	deneme2	6	2,449	Sep-01-2022, 05:44 AM Last Post: deneme2
	Nested for loops - help with iterating a variable outside of the main loop	dm222	4	1,588	Aug-17-2022, 10:17 PM Last Post: deanhystad
	loop (create variable where name is dependent on another variable)	brianhclo	1	1,140	Aug-05-2022, 07:46 AM Last Post: bowlofred
	Multiple Loop Statements in a Variable	Dexty	1	1,204	May-23-2022, 08:53 AM Last Post: bowlofred
	Variable flag vs code outside of for loop?(Disregard)	cubangt	2	1,172	Mar-16-2022, 08:54 PM Last Post: cubangt

Using Excel Cell As A Variable In A Loop

User Panel Messages

Announcements