Python Forum

Full Version: Returning Column and Row Data From Spreadsheet
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello all,

I'm trying to use the data of a spreadsheet as two variables to iterate through a test webscraper script using pandas, but I'm a little stumped as to how to use two columns for two variables as iterations. For example in the first loop, use A1 and A2, then for the next iteration B1 and B2, then C1 and C2 etc.

Here is my code:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import openpyxl

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

heading_type = []
heading = []
keyword1 = []
url1 =[]


data = {'keyword': keyword1, 'Url':url1}




wb = openpyxl.load_workbook('D:/Share/Documents/importurl.xlsx')
ws = wb['Sheet1']

for cell in ws['A']:
    print(cell.value)
    url = cell.value
    url1.append(url)

#     r = requests.get(url, headers=headers)
#     soup = BeautifulSoup(r.text, features="html.parser")

for cell in ws['B']:
    keyword = cell.value
    print(keyword)
    keyword1.append(keyword)


    df = pd.DataFrame(data=data)
    df.index += 1
    df.to_excel(f"D:/Share/Documents/summary.xlsx")


I get the error:

Error:
Traceback (most recent call last): File "D:\Share\Documents\PycharmProjects\websitelearning\main.py", line 103, in <module> df = pd.DataFrame(data=data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\me\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py", line 709, in __init__ mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\me\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\internals\construction.py", line 481, in dict_to_mgr return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\me\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\internals\construction.py", line 115, in arrays_to_mgr index = _extract_index(arrays) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\me\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\internals\construction.py", line 655, in _extract_index raise ValueError("All arrays must be of the same length") ValueError: All arrays must be of the same length
I've attached the file here so hopefully it is clear. Happy to clarify further if what I'm trying to achieve is still not clear. I guess I need to run one loop that will query both column a and column b contents at the same time and iterate to the next row- but I'm not sure how to do this.

Thank you for your time.