(Jul-11-2021, 02:32 AM)jacklee26 Wrote: what if the HTML page has double html like this, I went to get but it is emptyAdding
/source
will break the XPath.There is tag that makes this task different,which is
iframe
.Also a common mistake is not given page time to load,i use
time.sleep
as first test,there is Waits that deal with this.So to test code give,on real page may need to switch window
browser.switch_to.frame(iframe)
.I get source text from
iframe
,then can parse that text(is now just text not html) with BS to get tag wanted.from selenium import webdriver from selenium.webdriver.chrome.options import Options from time import sleep from bs4 import BeautifulSoup #--| Setup options = Options() #options.add_argument("--headless") browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options) #--| Parse or automation browser.get('file:///E:/div_code/scrape/local4.html') sleep(3) video_tag = browser.find_elements_by_xpath('//*[@id="allmyplayer"]') print(video_tag) # Send text html to BS for parse soup = BeautifulSoup(video_tag[0].text, 'lxml') print(soup.find('source').get('src', 'Not Found'))
Output:[<selenium.webdriver.remote.webelement.WebElement (session="d0f2629448eb9fb9baabc6dc77342fb9", element="7cc91c06-2934-4243-9645-a334805ce2c4")>]
https://vs02.520call.me/files/mp4/1/13cDq.m3u8?t=1625961526