Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Loop a list and write to .xlsx
#1
This is a part of my scraper code.

There is a .txt file with some keywords (data.txt). The code is loop through all the keywords to find them on a page. The end of the code I would like to save all the results in the export.xlsx with all the keywords from data.txt

The code run fine but the save method doesn't work as i wanted.

In every loop the .xlsx file is cleared and only the new results appears in it.

Can You help me in this case? Thank You!

with open(r'C:\Users\MyName\Documents\fbm\data.txt') as f:
    for line in f:

        str1 = 'first_part_of_url'
        str2 = 'second_part_of_url'
        str3 = 'third_part_of_url'
        url = str1 + str2 + line + str3
        driver.get(url)
        time.sleep(3)

        ids =driver.find_elements_by_xpath("//a[contains(@href,'urlthing/')]")

        n=1
        links=()


        for ii in ids [ :n]:
            links+=(ii.get_attribute('href'),)

        datalist= []

        for link in links:
            driver.get(link)
            time.sleep(1)

            Keyword = driver.find_element_by_xpath('//*[@id="data"]/some_div/span').text

            df_text = [Keyword,link,line]
            datalist.append((Keyword,link,line))
            ColName = ['Keyword','link','line']

            df = pd.DataFrame(datalist, columns =['Keyword','link','line'])
            df.to_excel(r'C:\Users\MyName\Documents\fbm\export.xlsx', index= True, encoding= 'utf-8')
Reply
#2
Quote:The code run fine but the save method doesn't work as i wanted
This code cannot fetch a webpage as written
what is value of url after line 7?
print it out and see for yourself.
add after line 7:
print(f"url: {url}")

In addition, you need to attach a portion of data.txt (or entire file if small)
Reply
#3
(Apr-11-2020, 04:02 PM)Larz60+ Wrote:
Quote:The code run fine but the save method doesn't work as i wanted
This code cannot fetch a webpage as written
what is value of url after line 7?
print it out and see for yourself.
add after line 7:
print(f"url: {url}")

In addition, you need to attach a portion of data.txt (or entire file if small)

I changed the names of the URLs and elements, this part is works fine.

The data.txt looks like this:
sample1
sample2
sample3
sample4


I'm looking for these words on the pages. When he program program fininshed the export.xlsx contains only "sample4" (or the last) related things.
If I stopped it after "sample2" the export.xlsx contains only "sample2" related things.

I think somehow I need get out the df.to_excel from the loop, but when I do this, the export.xlsx doesn't save anything.
Reply
#4
that's not the way it works.
  1. you supply the driver with a valid URL
  2. A valid URL looks like: https://python-forum.io
  3. you fetch the webpage html using selenium (Driver in your instance)
  4. You then use various means to locate the data you want within the raw html

suggest:
web scraping part 1
web scraping part 2
Reply
#5
There is a .txt file with some keywords (data.txt). The code is loop through all the keywords to get some info from a webpage. The end of the code I would like to save all the results in the export.xlsx with all the keywords from data.txt

The code run fine but the save method doesn't work as i wanted.

In every loop the .xlsx file is cleared and only the new (last) results appears in it.

Can You help me in this case? Thank You!

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time
from datetime import datetime

driver = webdriver.Chrome(executable_path = "C:/projects/chromedriver_win32/chromedriver.exe")
#driver.fullscreen_window()

with open(r'C:\Users\MyName\Documents\fbm\data.txt') as f:
    for line in f:

        #not relevant parts of the code

        n=1
        links=()

        for ii in ids [ :n]:
            links+=(ii.get_attribute('href'),)
            
        datalist= []
        
        for link in links:  
            driver.get(link)
            time.sleep(1)
            
            carName = driver.find_element_by_xpath('//*[@id="something"]some_divs/span').text
        
            df_text = [carName,link,line]
            datalist.append((carName,link,line))
            ColName = ['carName','link','line']
        
            df = pd.DataFrame(datalist, columns =['Keyword','link','line'])
            df.to_excel(r'C:\Users\MyName\Documents\fbm\export.xlsx', index= True, encoding= 'utf-8')
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to write a code to get a long list of unknown URLs hkynefin 2 2,498 Sep-16-2018, 11:44 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020