Python Forum
Using Excel Cell As A Variable In A Loop
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using Excel Cell As A Variable In A Loop
#1
Hey guys,

In trying to learn more about webscraping and as such I've set myself a challenge to try and scrape data off a few pages within the same website.

Each page has the same attributes (handy for webscraping each page!) but obviously the end part to the url address for each page is different. So, I've gathered the different page URL's and exported them to a spreadsheet.

What I'm trying to do now (and failing miserably) is to tell Python to use a column in my excel file which contains each page url as the page to be scrapped. Once it grabs a page URL, then it should go through with parsing the page with BeautifulSoup, extract certain elements and export that onto another excel spreadsheet.

This will then need to loop through each url and do the same thing until it goes through all the urls on the spreadsheet.

The code I've got so far to open the spreadsheet and refer to the column is:

from bs4 import BeautifulSoup
import pandas as pd
import openpyxl
import requests 

for page in current_url:
    book = openpyxl.load_workbook("url_list.xlsx")
    sheet = book['Sheet2']
    column_name = 'Full Page Url'
    for column_cell in sheet.iter_cols(1, sheet.max_column):  # iterate column cell
        if column_cell[0].value == column_name:    # check for your column
            j = 0
            for data in column_cell[1:]:    # iterate your column
                url_component = data.value
                
            break

     page = requests.get(url_component)
     soup = BeautifulSoup(page.text, 'html.parser')
     print(soup)
I've tried print(soup) there just to check that it's referencing a url from the spreadsheet.

The result I get is:
Output:
Process finished with exit code 0
But there's no html data- so it's doesn't appear to be working.

If I run this:

from bs4 import BeautifulSoup
import pandas as pd
import openpyxl
import requests 

book = openpyxl.load_workbook("url_list.xlsx")
sheet = book['Sheet2']
    column_name = 'Full Page Url'
    for column_cell in sheet.iter_cols(1, sheet.max_column):  # iterate column cell
        if column_cell[0].value == column_name:    # check for your column
            j = 0
            for data in column_cell[1:]:    # iterate your column
                url_component = data.value
                
            break
It's correctly giving me each url (so it's reading and referencing the Excel file and column correctly). For example the code above gives:

Output:
https://www.samplesite.com/360/ https://www.samplesite.com/3d-checker/
Could someone please help me understand where I'm going wrong?

Thanking you.
Reply


Messages In This Thread
Using Excel Cell As A Variable In A Loop - by knight2000 - Jul-09-2021, 07:10 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Variable definitions inside loop / could be better? gugarciap 2 439 Jan-09-2024, 11:11 PM
Last Post: deanhystad
  How to create a variable only for use inside the scope of a while loop? Radical 10 1,711 Nov-07-2023, 09:49 AM
Last Post: buran
  Problem with print variable in print.cell (fpdf) muconi 0 662 Dec-25-2022, 02:24 PM
Last Post: muconi
  How to loop through all excel files and sheets in folder jadelola 1 4,490 Dec-01-2022, 06:12 PM
Last Post: deanhystad
  Deleting rows based on cell value in Excel azizrasul 11 2,634 Oct-19-2022, 02:38 AM
Last Post: azizrasul
  export into excel, how to implement pandas into for-loop deneme2 6 2,449 Sep-01-2022, 05:44 AM
Last Post: deneme2
  Nested for loops - help with iterating a variable outside of the main loop dm222 4 1,588 Aug-17-2022, 10:17 PM
Last Post: deanhystad
  loop (create variable where name is dependent on another variable) brianhclo 1 1,140 Aug-05-2022, 07:46 AM
Last Post: bowlofred
  Multiple Loop Statements in a Variable Dexty 1 1,204 May-23-2022, 08:53 AM
Last Post: bowlofred
Big Grin Variable flag vs code outside of for loop?(Disregard) cubangt 2 1,172 Mar-16-2022, 08:54 PM
Last Post: cubangt

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020