Python Forum
how to scrape a website from a keyword list
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to scrape a website from a keyword list
#1
Hello;
I am new to Python,
I am trying to scrape a website using search keywords from a list (text file), loop through each line of keywords until each and every keyword in the file has been searched.
the search block of code works and prints the result.

Here is my code:

driver.get('https://www.website.com/?q=dog%20care')
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
a = soup.select('div.class_name a')
for a in soup.select('div.class_name a'):
   print(a['title'])

# read file sction:

with open ("keyword_list.txt", "r") as f:
    for line in f:
        print(line.strip()))
the read file block of code works, but I am not sure how to make it work together with the search function.

Can anyone help me with this code, please?
Reply
#2
you can just use readlines assuming each keyword is on its own line
with open("keyword_list.txt") as f:
    lines = f.readlines().strip()
...
for line in lines:
    if line in soup:
        print(f"keyword {line} is in website")
greenpine likes this post
Recommended Tutorials:
Reply
#3
thanks for the reply,
I'll try it.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Unable to Scrape Website muhamdasim 2 2,622 Dec-27-2021, 07:49 PM
Last Post: JohnRaven
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,203 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  Read url from CSV and Scrape website Prince_Bhatia 3 10,275 Jan-08-2020, 09:08 AM
Last Post: binaryanimal
  why I can't scrape a website? kmkim319 7 7,549 Sep-27-2019, 03:14 PM
Last Post: kmkim319
  How do i scrape website whose page changes using javsacript _dopostback function and Prince_Bhatia 1 7,243 Aug-06-2018, 09:45 AM
Last Post: wavic
  Scrape A tags from a website Prince_Bhatia 1 4,231 Oct-15-2017, 12:56 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020