Python Forum
Using Excel Cell As A Variable In A Loop
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using Excel Cell As A Variable In A Loop
#1
Hi all,

As a relative newbie to Python and webscrapping, I've been trying to learn more about headers and proxies- in terms of rotating them from a excel list I've created. I've watched lots of different vids and read posts, but I'm a little stuck in it's application.

Starting off with the header components, I've an Excel file with a collection of various headers. The URL will have many pages, so the goal is to try and have a different header for each page.

I'm trying to open the Excel file and grab a header from the cell and use it to form the variable for headers.

Here's what I started with:

import requests
import pandas as pd
from bs4 import BeautifulSoup
import openpyxl

url = 'mytesturl'

wb = openpyxl.load_workbook('RandomUserAgentList.xlsx')
ws = wb['Sheet1']
headers = []

for cell in ws['A']:
    random_header_variable = cell.value
    headers = "{'User-Agent': " + random_header_variable + "}"
    print(headers)
The output from this is:
Output:
{'User-Agent': Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36} {'User-Agent': Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36} {'User-Agent': Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 OPR/66.0.3515.72} {'User-Agent': Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36} {'User-Agent': Mozilla/5.0 (X11; CrOS aarch64 13421.99.0) AppleWebKit/537.36 (KHTML; like Gecko) Chrome/86.0.4240.198 Safari/537.36}
So that looks good.

But then from there, I'm not sure how to incorporate it into requests for the page. When I've used:

import requests
import pandas as pd
from bs4 import BeautifulSoup
import openpyxl

url = 'mytesturl'

wb = openpyxl.load_workbook('RandomUserAgentList.xlsx')
ws = wb['Sheet1']
headers = []

for cell in ws['A']:
    random_header_variable = cell.value
    headers = "{'User-Agent': " + random_header_variable + "}"
    r = requests.get(url, headers = headers)
I get a whole series of errors:
Error:
Traceback (most recent call last): File "C:/Users/test_headers.py", line 15, in <module> r = requests.get(url, headers = headers) File "C:\Users\anaconda3\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, **kwargs) File "C:\Users\anaconda3\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "C:\Users\anaconda3\lib\site-packages\requests\sessions.py", line 528, in request prep = self.prepare_request(req) File "C:\Users\anaconda3\lib\site-packages\requests\sessions.py", line 456, in prepare_request p.prepare( File "C:\Users\anaconda3\lib\site-packages\requests\models.py", line 317, in prepare self.prepare_headers(headers) File "C:\Users\anaconda3\lib\site-packages\requests\models.py", line 449, in prepare_headers for header in headers.items(): AttributeError: 'str' object has no attribute 'items'
I'm presuming it's because the format is wrong?

I also tried to format it:

headers = f"{'User-Agent': {random_header_variable} }"
but then I get:
Error:
headers = f"{'User-Agent': {random_header_variable} }" ValueError: Invalid format specifier
Could someone please enlighten me how to format this correctly?

Thanks a lot.
Reply
#2
Looks like it's saying that params can't be a str. You should probably build it as a dictionary instead.
Reply
#3
Hi Bowlofred,

Thanks for your suggestion- I understand what you're saying, but to be honest, I don't know how to go about applying that- particularly since I'm importing part of that as a variable.



(Aug-23-2021, 06:40 AM)bowlofred Wrote: Looks like it's saying that params can't be a str. You should probably build it as a dictionary instead.
Reply
#4
Try
headers = {'User-Agent': random_header_variable}
Reply
#5
(Aug-23-2021, 10:15 AM)Yoriz Wrote: Try
headers = {'User-Agent': random_header_variable}

Thank you Yoriz.

I gave that a go and good news is that I'm not getting any error message...bad news is that nothing seems to happen- when I run it, it's doing something but never ends until I manually stop it! I'm guessing it could be stuck in a loop.
Reply
#6
You can add a print statement in your loop to verify it's making progress.

...
r = requests.get(url, headers = headers)
print(f"Got {r.status} code from {r.url} lookup")
In your original code, you don't seem to be doing anything with the result of the get. How are you using it?
Reply
#7
Thank you.

I was running small print statements as I was going to verify and everything looked good up until adding the last requests line.

So the last print statement where everything was working was:
or cell in ws['A']:
    random_header_variable = cell.value
    headers = "{'User-Agent': " + random_header_variable + "}"
    print(headers) 
That printed all the headers and it looked correct. As soon as I added:
r = requests.get(url, headers = headers)
(I didn't try a print statement at that point), I got the errors as posted above.

I think I may give up on this one! Maybe I might use a few full header variations in a list in the code rather than get the code to pick part of it from a spreadsheet.

Really appreciate your help and time though.

(Aug-24-2021, 02:18 AM)bowlofred Wrote: You can add a print statement in your loop to verify it's making progress.

...
r = requests.get(url, headers = headers)
print(f"Got {r.status} code from {r.url} lookup")
In your original code, you don't seem to be doing anything with the result of the get. How are you using it?
Reply
#8
Using the data you posted.
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://python-forum.io/'
agent_lst = [
    {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'},
    {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'},
    {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 OPR/66.0.3515.72'},
    {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'},
    {'User-Agent': 'Mozilla/5.0 (X11; CrOS aarch64 13421.99.0) AppleWebKit/537.36 (KHTML; like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
]
for agent in agent_lst:
    response = requests.get(url, headers=agent)
    print(response.status_code)
    soup = BeautifulSoup(response.content, 'lxml')
    print(soup.select_one('head > title'))
Output:
200 <title>Python Forum</title> 200 <title>Python Forum</title> 200 <title>Python Forum</title> 200 <title>Python Forum</title> 200 <title>Python Forum</title>
You most be making a working dictionary in headers list,now you mess it up and make string.
from pprint import pprint

# Example data from Excel
lst = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
]

headers = []
d = {}
for agent in lst:
    d['User-Agent'] = agent
    headers.append(d)
pprint(headers)
Output:
[{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) ' 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 ' 'Safari/537.36'}, {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) ' 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 ' 'Safari/537.36'}]
So now contain headers list working dictionary.
>>> headers[0]
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) '
               'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 '
               'Safari/537.36'}
>>> headers[0].get('User-Agent') # get a dictionary  method 
('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, '
 'like Gecko) Chrome/87.0.4280.88 Safari/537.36')  
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Variable definitions inside loop / could be better? gugarciap 2 373 Jan-09-2024, 11:11 PM
Last Post: deanhystad
  How to create a variable only for use inside the scope of a while loop? Radical 10 1,523 Nov-07-2023, 09:49 AM
Last Post: buran
  Problem with print variable in print.cell (fpdf) muconi 0 632 Dec-25-2022, 02:24 PM
Last Post: muconi
  How to loop through all excel files and sheets in folder jadelola 1 4,329 Dec-01-2022, 06:12 PM
Last Post: deanhystad
  Deleting rows based on cell value in Excel azizrasul 11 2,473 Oct-19-2022, 02:38 AM
Last Post: azizrasul
  export into excel, how to implement pandas into for-loop deneme2 6 2,357 Sep-01-2022, 05:44 AM
Last Post: deneme2
  Nested for loops - help with iterating a variable outside of the main loop dm222 4 1,531 Aug-17-2022, 10:17 PM
Last Post: deanhystad
  loop (create variable where name is dependent on another variable) brianhclo 1 1,101 Aug-05-2022, 07:46 AM
Last Post: bowlofred
  Multiple Loop Statements in a Variable Dexty 1 1,175 May-23-2022, 08:53 AM
Last Post: bowlofred
Big Grin Variable flag vs code outside of for loop?(Disregard) cubangt 2 1,129 Mar-16-2022, 08:54 PM
Last Post: cubangt

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020