Loop a list and write to .xlsx

urobee · Apr-11-2020, 03:41 PM

This is a part of my scraper code.

There is a .txt file with some keywords (data.txt). The code is loop through all the keywords to find them on a page. The end of the code I would like to save all the results in the export.xlsx with all the keywords from data.txt

The code run fine but the save method doesn't work as i wanted.

In every loop the .xlsx file is cleared and only the new results appears in it.

Can You help me in this case? Thank You!

with open(r'C:\Users\MyName\Documents\fbm\data.txt') as f:
    for line in f:

        str1 = 'first_part_of_url'
        str2 = 'second_part_of_url'
        str3 = 'third_part_of_url'
        url = str1 + str2 + line + str3
        driver.get(url)
        time.sleep(3)

        ids =driver.find_elements_by_xpath("//a[contains(@href,'urlthing/')]")

        n=1
        links=()


        for ii in ids [ :n]:
            links+=(ii.get_attribute('href'),)

        datalist= []

        for link in links:
            driver.get(link)
            time.sleep(1)

            Keyword = driver.find_element_by_xpath('//*[@id="data"]/some_div/span').text

            df_text = [Keyword,link,line]
            datalist.append((Keyword,link,line))
            ColName = ['Keyword','link','line']

            df = pd.DataFrame(datalist, columns =['Keyword','link','line'])
            df.to_excel(r'C:\Users\MyName\Documents\fbm\export.xlsx', index= True, encoding= 'utf-8')

**Larz60+** · (This post was last modified: Apr-11-2020, 04:02 PM by Larz60+.)

Quote:The code run fine but the save method doesn't work as i wanted

This code cannot fetch a webpage as written
what is value of url after line 7?
print it out and see for yourself.
add after line 7:
print(f"url: {url}")

In addition, you need to attach a portion of data.txt (or entire file if small)

urobee · Apr-11-2020, 04:24 PM

(Apr-11-2020, 04:02 PM)Larz60+ Wrote:
Quote:The code run fine but the save method doesn't work as i wanted
This code cannot fetch a webpage as written
what is value of url after line 7?
print it out and see for yourself.
add after line 7:
print(f"url: {url}")

In addition, you need to attach a portion of data.txt (or entire file if small)

I changed the names of the URLs and elements, this part is works fine.

The data.txt looks like this:
sample1
sample2
sample3
sample4

I'm looking for these words on the pages. When he program program fininshed the export.xlsx contains only "sample4" (or the last) related things.
If I stopped it after "sample2" the export.xlsx contains only "sample2" related things.

I think somehow I need get out the df.to_excel from the loop, but when I do this, the export.xlsx doesn't save anything.

**Larz60+** · Apr-11-2020, 09:07 PM

that's not the way it works.

you supply the driver with a valid URL
A valid URL looks like: https://python-forum.io
you fetch the webpage html using selenium (Driver in your instance)
You then use various means to locate the data you want within the raw html

suggest:
web scraping part 1
web scraping part 2

urobee · Apr-12-2020, 11:21 AM

There is a .txt file with some keywords (data.txt). The code is loop through all the keywords to get some info from a webpage. The end of the code I would like to save all the results in the export.xlsx with all the keywords from data.txt

The code run fine but the save method doesn't work as i wanted.

In every loop the .xlsx file is cleared and only the new (last) results appears in it.

Can You help me in this case? Thank You!

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time
from datetime import datetime

driver = webdriver.Chrome(executable_path = "C:/projects/chromedriver_win32/chromedriver.exe")
#driver.fullscreen_window()

with open(r'C:\Users\MyName\Documents\fbm\data.txt') as f:
    for line in f:

        #not relevant parts of the code

        n=1
        links=()

        for ii in ids [ :n]:
            links+=(ii.get_attribute('href'),)
            
        datalist= []
        
        for link in links:  
            driver.get(link)
            time.sleep(1)
            
            carName = driver.find_element_by_xpath('//*[@id="something"]some_divs/span').text
        
            df_text = [carName,link,line]
            datalist.append((carName,link,line))
            ColName = ['carName','link','line']
        
            df = pd.DataFrame(datalist, columns =['Keyword','link','line'])
            df.to_excel(r'C:\Users\MyName\Documents\fbm\export.xlsx', index= True, encoding= 'utf-8')

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Trying to write a code to get a long list of unknown URLs	hkynefin	2	3,229	Sep-16-2018, 11:44 AM Last Post: Larz60+

Loop a list and write to .xlsx

User Panel Messages

Announcements