So far you have helped me to put together below code - thank you for that.
I am not sure how to write new line into csv without overwriting the same line over and over again. I messed up the loop and I am not sure how to fix it.
Second thing is that, Its need to to crawl only new entries in the future (every week), so I think it also need to check the extracted.csv every time to avoid duplicate content before It will put a new line into the extracted.csv.
I hope you can give me a hint.
Thank You buddy.
import csv import requests import datetime import time from requests import get from bs4 import BeautifulSoup with open('data.csv', encoding='utf8') as csvfile: reader = csv.reader(csvfile, delimiter=';') next(reader) count = 0 for row in reader: timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") url = f'https://www.somedomain.com/result?country=en&q={row[1]}' headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} cookies = {'__test': '1bb6e881021f013463740eeb74840b18'} content = get(url, headers=headers, cookies=cookies).content soup = BeautifulSoup(content, "lxml") table_info = soup.select_one('.table-info') mail = table_info.select_one('.col-2 a[href^=mailto]') mail = mail.get('href') mail_clean = mail.split(':')[1] website = soup.find(text='Website:') website = table_info.select_one('.col-2 a[target^=_blank]') website = website.get('href') collected_data = row[1], mail_clean, website, timestamp data_list = [["Regcode", "Email", "Website", "Timestamp"],collected_data] with open('extracted.csv', 'w', newline='') as file: writer = csv.writer(file, delimiter=';') writer.writerows(data_list) print(row[1], "|", mail_clean,"|", website,"|", timestamp) #print("Waiting 3 seconds...") #time.sleep(3) count+=1The code sort of works, but there are problems with the csv writing part.
I am not sure how to write new line into csv without overwriting the same line over and over again. I messed up the loop and I am not sure how to fix it.
Second thing is that, Its need to to crawl only new entries in the future (every week), so I think it also need to check the extracted.csv every time to avoid duplicate content before It will put a new line into the extracted.csv.
I hope you can give me a hint.
Thank You buddy.