Using Excel Cell As A Variable In A Loop - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Using Excel Cell As A Variable In A Loop (/thread-34698.html) |
Using Excel Cell As A Variable In A Loop - knight2000 - Aug-23-2021 Hi all, As a relative newbie to Python and webscrapping, I've been trying to learn more about headers and proxies- in terms of rotating them from a excel list I've created. I've watched lots of different vids and read posts, but I'm a little stuck in it's application. Starting off with the header components, I've an Excel file with a collection of various headers. The URL will have many pages, so the goal is to try and have a different header for each page. I'm trying to open the Excel file and grab a header from the cell and use it to form the variable for headers. Here's what I started with: import requests import pandas as pd from bs4 import BeautifulSoup import openpyxl url = 'mytesturl' wb = openpyxl.load_workbook('RandomUserAgentList.xlsx') ws = wb['Sheet1'] headers = [] for cell in ws['A']: random_header_variable = cell.value headers = "{'User-Agent': " + random_header_variable + "}" print(headers)The output from this is: So that looks good. But then from there, I'm not sure how to incorporate it into requests for the page. When I've used: import requests import pandas as pd from bs4 import BeautifulSoup import openpyxl url = 'mytesturl' wb = openpyxl.load_workbook('RandomUserAgentList.xlsx') ws = wb['Sheet1'] headers = [] for cell in ws['A']: random_header_variable = cell.value headers = "{'User-Agent': " + random_header_variable + "}" r = requests.get(url, headers = headers)I get a whole series of errors: I'm presuming it's because the format is wrong? I also tried to format it: headers = f"{'User-Agent': {random_header_variable} }"but then I get: Could someone please enlighten me how to format this correctly?Thanks a lot. RE: Using Excel Cell As A Variable In A Loop - bowlofred - Aug-23-2021 Looks like it's saying that params can't be a str. You should probably build it as a dictionary instead. RE: Using Excel Cell As A Variable In A Loop - knight2000 - Aug-23-2021 Hi Bowlofred, Thanks for your suggestion- I understand what you're saying, but to be honest, I don't know how to go about applying that- particularly since I'm importing part of that as a variable. (Aug-23-2021, 06:40 AM)bowlofred Wrote: Looks like it's saying that params can't be a str. You should probably build it as a dictionary instead. RE: Using Excel Cell As A Variable In A Loop - Yoriz - Aug-23-2021 Try headers = {'User-Agent': random_header_variable} RE: Using Excel Cell As A Variable In A Loop - knight2000 - Aug-24-2021 (Aug-23-2021, 10:15 AM)Yoriz Wrote: Try Thank you Yoriz. I gave that a go and good news is that I'm not getting any error message...bad news is that nothing seems to happen- when I run it, it's doing something but never ends until I manually stop it! I'm guessing it could be stuck in a loop. RE: Using Excel Cell As A Variable In A Loop - bowlofred - Aug-24-2021 You can add a print statement in your loop to verify it's making progress. ... r = requests.get(url, headers = headers) print(f"Got {r.status} code from {r.url} lookup")In your original code, you don't seem to be doing anything with the result of the get. How are you using it? RE: Using Excel Cell As A Variable In A Loop - knight2000 - Aug-25-2021 Thank you. I was running small print statements as I was going to verify and everything looked good up until adding the last requests line. So the last print statement where everything was working was: or cell in ws['A']: random_header_variable = cell.value headers = "{'User-Agent': " + random_header_variable + "}" print(headers)That printed all the headers and it looked correct. As soon as I added: r = requests.get(url, headers = headers)(I didn't try a print statement at that point), I got the errors as posted above. I think I may give up on this one! Maybe I might use a few full header variations in a list in the code rather than get the code to pick part of it from a spreadsheet. Really appreciate your help and time though. (Aug-24-2021, 02:18 AM)bowlofred Wrote: You can add a print statement in your loop to verify it's making progress. RE: Using Excel Cell As A Variable In A Loop - snippsat - Aug-25-2021 Using the data you posted. import requests import pandas as pd from bs4 import BeautifulSoup url = 'https://python-forum.io/' agent_lst = [ {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'}, {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}, {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 OPR/66.0.3515.72'}, {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'}, {'User-Agent': 'Mozilla/5.0 (X11; CrOS aarch64 13421.99.0) AppleWebKit/537.36 (KHTML; like Gecko) Chrome/86.0.4240.198 Safari/537.36'} ] for agent in agent_lst: response = requests.get(url, headers=agent) print(response.status_code) soup = BeautifulSoup(response.content, 'lxml') print(soup.select_one('head > title')) You most be making a working dictionary in headers list,now you mess it up and make string.from pprint import pprint # Example data from Excel lst = [ 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36' ] headers = [] d = {} for agent in lst: d['User-Agent'] = agent headers.append(d) pprint(headers) So now contain headers list working dictionary.>>> headers[0] {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) ' 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 ' 'Safari/537.36'} >>> headers[0].get('User-Agent') # get a dictionary method ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, ' 'like Gecko) Chrome/87.0.4280.88 Safari/537.36') |