Jul-18-2021, 10:45 AM
Hi Snippsat,
A big thanks for your help and direction too. I spent time to learn from and apply what you suggested, which included fixing up some problems with my initial code, such as the defining the variable's value at the top for starters.
I then used your code to test opening an Excel file and refer it to a specific column and it worked beautifully. After trying with your example, I also tried my own URLs in Excel and it also referenced those perfectly too.
One of the key takeaway's I learnt from your code was how to reference a column into a loop.
For example:
A big thanks for your help and direction too. I spent time to learn from and apply what you suggested, which included fixing up some problems with my initial code, such as the defining the variable's value at the top for starters.
I then used your code to test opening an Excel file and refer it to a specific column and it worked beautifully. After trying with your example, I also tried my own URLs in Excel and it also referenced those perfectly too.
One of the key takeaway's I learnt from your code was how to reference a column into a loop.
For example:
]for cell in ws['C']: url = cell.valueThank you for teaching me how to do this- really appreciate your time.
(Jul-10-2021, 11:58 AM)snippsat Wrote: There are serval problem or missing stuff with your fist code.
Start withfor page in current_url:
there is nocurrent_url
reference to loop over?
You read from a Excel file that have url list,so look like first loop is no needed at all.
Line 18,19,20 need to inside loop block.
Here is basic test if have urls in a Excel file and iterate over column A.
import openpyxl wb = openpyxl.load_workbook('url.xlsx') ws = wb['url_info'] for cell in ws['A']: print(cell.value)So if want open urls in BS it would be like this.
Output:https://python-forum.io/ https://www.google.no/ https://edition.cnn.com/
import openpyxl from bs4 import BeautifulSoup import requests wb = openpyxl.load_workbook('url.xlsx') ws = wb['url_info'] for cell in ws['A']: print(cell.value) response = requests.get(cell.value) soup = BeautifulSoup(response.content, 'lxml') print(soup.find('title'))
Output:https://python-forum.io/ <title>Python Forum</title> https://www.google.no/ <title>Google</title> https://edition.cnn.com/ <title>CNN International - Breaking News, US News, World News and Video</title>