Posts: 39
Threads: 11
Joined: Jan 2017
Apr-29-2017, 08:03 PM
(This post was last modified: May-01-2017, 03:40 PM by Ofnuts.)
I'm trying to scrape the stock-tickers of the chart embedded on the right of this page. http://investsnips.com/list-of-publicly-...companies/
(under the graph)When inspecting the html, the stock-symbols seem to be embedded here, under title ("NASDAQ:ADMA"), with below representing the code for one symbol:
<td class="symbol-short-name-container" title="NASDAQ:ADMA" style="cursor:
pointer;"><a href="https://www.tradingview.com/chart/?symbol=NASDAQ%3AADMA"
target="_blank">ADMA Biologics</a></td> However, I'm failing to capture this code via find_all.
import bs4 as bs
import urllib.request
import re
source = urllib.request.urlopen('http://investsnips.com/list-of-publicly-traded-micro-cap-diversified-biotechnology-and-pharmaceutical-companies/').read()
soup = bs.BeautifulSoup(source,'lxml')
body = soup.body (#It seems to be under body) After which
body.find_all('tr', class_="ticker quote-ticker-inited")
[] # empty list
body.find_all('td', class_="symbol-short-name-container")
[] #empty list So It seems that the site uses Javascript., but I have been scouring a webscraping book (old book) and the net, but I can't seem to figure out what I'm supposed to do.
Do I need a different module?
Thank you.
Posts: 5,151
Threads: 396
Joined: Sep 2016
Apr-29-2017, 08:30 PM
(This post was last modified: Apr-29-2017, 08:30 PM by metulburr.)
(Apr-29-2017, 08:03 PM)bigmit37 Wrote: So It seems that the site uses Javascript., but I have been scouring a webscraping book (old book) and the net, but I can't seem to figure out what I'm supposed to do.
Do I need a different module? Yes if the site uses javascript it changes on the fly, so the data doesnt exist when regular urllib.request. You need to automate a browser by using selenium. IF you look at the HTML that you browser gets and the HTML that python gets, and its different you need to use selenium. You can use PhantomJS to "hide" the browser in the background so to speak. All you really need to do since your just parsing the content is just get the HTML via selenium instead, and then pass that to BeuatifulSoup
Recommended Tutorials:
Posts: 39
Threads: 11
Joined: Jan 2017
Apr-29-2017, 08:35 PM
(This post was last modified: Apr-29-2017, 08:50 PM by bigmit37.)
Thanks. I will look into Selenium now and report back here when I get stuck or accomplish my task.
Actually, I noticed something which is confusing me. When I use the code soup.findall('p')[-2] , the list of tickers seem to be embedded in the returned output.
<p><script src="https://d33t3vvu2t2yu5.cloudfront.net/tv.js" type="text/javascript"></script><br/>
<script type="text/javascript">
new TradingView.MiniWidget({
"container_id": "tv-miniwidget-c316c",
"tabs": [
"Micro Cap Biotech"
],
"symbols": {
"Micro Cap Biotech": [
[
"Abeona Thera",
"NASDAQ:ABEO|3m"
],
[
"Actinium Pharma",
"AMEX:ATNM|3m"
],
[
"ADMA Biologics",
"NASDAQ:ADMA|3m"
],
[
"Adverum Biotech",
"NASDAQ:ADVM|3m"
],
[
"Aeglea",
"NASDAQ:AGLE|3m"
],
[
"Affimed",
"NASDAQ:AFMD|3m"
],
[
"Akari Therapeutics",
"NASDAQ:AKTX|3m"
],
[
"Alcobra",
"NASDAQ:ADHD|3m"
],
[
"Actinium Pharma",
"AMEX:ATNM|3m"
],
[
"ADMA Biologics",
"NASDAQ:ADMA|3m"
],
[
"Adverum Biotech",
"NASDAQ:ADVM|3m"
],
[
"Aeglea",
"NASDAQ:AGLE|3m"
],
[
"Affimed",
"NASDAQ:AFMD|3m"
],
[
"Akari Therapeutics",
"NASDAQ:AKTX|3m"
],
[
"Alcobra",
"NASDAQ:ADHD|3m" Does this mean, it's still scrapable with Beautiful Soup?
Thank you.
Posts: 12,031
Threads: 485
Joined: Sep 2016
Posts: 5,151
Threads: 396
Joined: Sep 2016
(Apr-29-2017, 08:52 PM)Larz60+ Wrote: Also look at the tutorials by snippsat: https://python-forum.io/Thread-Web-Scraping-part-1
and https://python-forum.io/Thread-Web-scraping-part-2 (this one has examples with selenium)
Nice i wasnt aware he had selenium explanation on his tuts.
Recommended Tutorials:
Posts: 39
Threads: 11
Joined: Jan 2017
Apr-30-2017, 06:46 PM
(This post was last modified: Apr-30-2017, 11:32 PM by metulburr.)
Okay, I'm using Selenium but I seem to be stuck.
I can't seem to switch to the frame I want.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://investsnips.com/list-of-publicly-traded-micro-cap-diversified-biotechnology-and-pharmaceutical-companies/')
#driver.find_element_by_xpath('//*[@id="tradingview_4e896"]')
driver.switch_to.frame("tradingview_4e896") I've tried both the commented line and the switch_to_frame, and I'm endin with a Nosuch element/Frame error
NoSuchElementException: Message: Unable to locate element: //*[@id="tradingview_4e896"]
This is the frame I'm trying to connect to :
<iframe id="tradingview_4e896" src="https://s.tradingview.com/miniwidgetembed/?Micro%20Cap%20Biotech=Abeona%20Thera,Actinium%20Pharma,ADMA%20Biologics,Adverum%20Biotech,Aeglea,Affimed,Akari%20Therapeutics,Alcobra,Aldeyra%20Thera,Ampio%20Pharma,Anavex%20Life%20Sci,Anthera%20Pharma,Applied%20Genetic,Arbutus%20Biopharma,Argos%20Therapeutics,ArQule,Arrowhead%20Research,Assembly%20Bio,Asterias%20Bio,Athersys,aTyr%20Pharma,Aurinia%20Pharma,AVEO%20Pharma,Aviragen,Axsome%20Thera,BioLineRx,Bio%20Path%20Holdings,Calithera%20Bio,Capricor%20Thera,Cara%20Therapeutics,Cascadian%20Thera,CASI%20Pharma,Catabasis%20Pharma,Catalyst%20Pharma,Cellular%20Bio,Celyad,Chiasma,Chimerix,Cidara%20Thera,Clearside%20Bio,Codexis,Compugen,Concert%20Pharma,Conatus%20Pharma,ContraFect,ContraVir%20Pharma,CorMedix,CTI%20BioPharma,Dicerna%20Pharma,Dimension%20Thera,Durect%20Corp,Dynavax%20Tech,Eiger%20BioPharma,Eleven%20Bio,Endocyte,Enzymotec,Fate%20Therapeutics,Flex%20Pharma,Fortress%20Biotech,Galena%20Biopharma,Gemphire%20Thera,Genocea%20Bio,GlycoMimetics,GTX%20Inc,Idera%20Pharma,Ignyta,Immune%20Design,ImmunoGen,Infinity%20Pharma,Inotek%20Pharma,Intec%20Pharma,Kadmon%20Holdings,Kamada,Kindred%20Bio,Kura%20Oncology,MediciNova,MediWound,MEI%20Pharma,Mirati%20Thera,Motif%20Bio,Nabriva%20Thera,NanoViricides,Neptune%20Tech,NovaBay%20Pharma,Nymox%20Pharma,Ocera%20Therapeutics,Ocular%20Thera,Ohr%20Pharma,Oncobiologics,Ophthotech%20Corp,Osiris%20Thera,Ovascience,Palatin%20Tech,Peregrine%20Pharma,Pfenex,PharmAthene,Pieris%20Pharma,Pluristem%20Thera,Prima%20BioMed,ProQR%20Thera,Reata%20Pharma,Redhill%20Biopharma,Regulus%20Thera,Rigel%20Pharma,Sangamo%20Bio,Sophiris%20Bio,Spring%20Bank,Stemline%20Thera,Strongbridge%20Bio,Summit%20Thera,Sunesis%20Pharma,Syndax%20Pharma,Synthetic%20Biologics,Tenax%20Thera,TG%20Therapeutics,Titan%20Pharma,Tracon%20Pharma,Trevena,Trillium%20Thera,uniQure,Vascular%20Biogenics,VBI%20Vaccines,Vericel%20Corp,Versartis,Vital%20Therapies,VIVUS,Xenon%20Pharma,Zafgen,Zynerba%20Pharma&tabs=Micro%20Cap%20Biotech&Abeona%20Thera=NASDAQ%3AABEO%7C3m&Actinium%20Pharma=AMEX%3AATNM%7C3m&ADMA%20Biologics=NASDAQ%3AADMA%7C3m&Adverum%20Biotech=NASDAQ%3AADVM%7C3m&Aeglea=NASDAQ%3AAGLE%7C3m&Affimed=NASDAQ%3AAFMD%7C3m&Akari%20Therapeutics=NASDAQ%3AAKTX%7C3m&Alcobra=NASDAQ%3AADHD%7C3m&Aldeyra%20Thera=NASDAQ%3AALDX%7C3m&Ampio%20Pharma=AMEX%3AAMPE%7C3m&Anavex%20Life%20Sci=NASDAQ%3AAVXL%7C3m&Anthera%20Pharma=NASDAQ%3AANTH%7C3m&Applied%20Genetic=NASDAQ%3AAGTC%7C3m&Arbutus%20Biopharma=NASDAQ%3AABUS%7C3m&Argos%20Therapeutics=NASDAQ%3AARGS%7C3m&ArQule=NASDAQ%3AARQL%7C3m&Arrowhead%20Research=NASDAQ%3AARWR%7C3m&Assembly%20Bio=NASDAQ%3AASMB%7C3m&Asterias%20Bio=AMEX%3AAST%7C3m&Athersys=NASDAQ%3AATHX%7C3m&aTyr%20Pharma=NASDAQ%3ALIFE%7C3m&Aurinia%20Pharma=NASDAQ%3AAUPH%7C3m&AVEO%20Pharma=NASDAQ%3AAVEO%7C3m&Aviragen=NASDAQ%3AAVIR%7C3m&Axsome%20Thera=NASDAQ%3AAXSM%7C3m&BioLineRx=NASDAQ%3ABLRX%7C3m&Bio%20Path%20Holdings=NASDAQ%3ABPTH%7C3m&Calithera%20Bio=NASDAQ%3ACALA%7C3m&Capricor%20Thera=NASDAQ%3ACAPR%7C3m&Cara%20Therapeutics=NASDAQ%3ACARA%7C3m&Cascadian%20Thera=NASDAQ%3ACASC%7C3m&CASI%20Pharma=NASDAQ%3ACASI%7C3m&Catabasis%20Pharma=NASDAQ%3ACATB%7C3m&Catalyst%20Pharma=NASDAQ%3ACPRX%7C3m&Cellular%20Bio=NASDAQ%3ACBMG%7C3m&Celyad=NASDAQ%3ACYAD%7C3m&Chiasma=NASDAQ%3ACHMA%7C3m&Chimerix=NASDAQ%3ACMRX%7C3m&Cidara%20Thera=NASDAQ%3ACDTX%7C3m&Clearside%20Bio=NASDAQ%3ACLSD%7C3m&Codexis=NASDAQ%3ACDXS%7C3m&Compugen=NASDAQ%3ACGEN%7C3m&Concert%20Pharma=NASDAQ%3ACNCE%7C3m&Conatus%20Pharma=NASDAQ%3ACNAT%7C3m&ContraFect=NASDAQ%3ACFRX%7C3m&ContraVir%20Pharma=NASDAQ%3ACTRV%7C3m&CorMedix=AMEX%3ACRMD%7C3m&CTI%20BioPharma=NASDAQ%3ACTIC%7C3m&Dicerna%20Pharma=NASDAQ%3ADRNA%7C3m&Dimension%20Thera=NASDAQ%3ADMTX%7C3m&Durect%20Corp=NASDAQ%3ADRRX%7C3m&Dynavax%20Tech=NASDAQ%3ADVAX%7C3m&Eiger%20BioPharma=NASDAQ%3AEIGR%7C3m&Eleven%20Bio=NASDAQ%3AEBIO%7C3m&Endocyte=NASDAQ%3AECYT%7C3m&Enzymotec=NASDAQ%3AENZY%7C3m&Fate%20Therapeutics=NASDAQ%3AFATE%7C3m&Flex%20Pharma=NASDAQ%3AFLKS%7C3m&Fortress%20Biotech=NASDAQ%3AFBIO%7C3m&Galena%20Biopharma=NASDAQ%3AGALE%7C3m&Gemphire%20Thera=NASDAQ%3AGEMP%7C3m&Genocea%20Bio=NASDAQ%3AGNCA%7C3m&GlycoMimetics=NASDAQ%3AGLYC%7C3m&GTX%20Inc=NASDAQ%3AGTXI%7C3m&Idera%20Pharma=NASDAQ%3AIDRA%7C3m&Ignyta=NASDAQ%3ARXDX%7C3m&Immune%20Design=NASDAQ%3AIMDZ%7C3m&ImmunoGen=NASDAQ%3AIMGN%7C3m&Infinity%20Pharma=NASDAQ%3AINFI%7C3m&Inotek%20Pharma=NASDAQ%3AITEK%7C3m&Intec%20Pharma=NASDAQ%3ANTEC%7C3m&Kadmon%20Holdings=NYSE%3AKDMN%7C3m&Kamada=NASDAQ%3AKMDA%7C3m&Kindred%20Bio=NASDAQ%3AKIN%7C3m&Kura%20Oncology=NASDAQ%3AKURA%7C3m&MediciNova=NASDAQ%3AMNOV%7C3m&MediWound=NASDAQ%3AMDWD%7C3m&MEI%20Pharma=NASDAQ%3AMEIP%7C3m&Mirati%20Thera=NASDAQ%3AMRTX%7C3m&Motif%20Bio=NASDAQ%3AMTFB%7C3m&Nabriva%20Thera=NASDAQ%3ANBRV%7C3m&NanoViricides=AMEX%3ANNVC%7C3m&Neptune%20Tech=NASDAQ%3ANEPT%7C3m&NovaBay%20Pharma=AMEX%3ANBY%7C3m&Nymox%20Pharma=NASDAQ%3ANYMX%7C3m&Ocera%20Therapeutics=NASDAQ%3AOCRX%7C3m&Ocular%20Thera=NASDAQ%3AOCUL%7C3m&Ohr%20Pharma=NASDAQ%3AOHRP%7C3m&Oncobiologics=NASDAQ%3AONS%7C3m&Ophthotech%20Corp=NASDAQ%3AOPHT%7C3m&Osiris%20Thera=NASDAQ%3AOSIR%7C3m&Ovascience=NASDAQ%3AOVAS%7C3m&Palatin%20Tech=AMEX%3APTN%7C3m&Peregrine%20Pharma=NASDAQ%3APPHM%7C3m&Pfenex=AMEX%3APFNX%7C3m&PharmAthene=AMEX%3APIP%7C3m&Pieris%20Pharma=NASDAQ%3APIRS%7C3m&Pluristem%20Thera=NASDAQ%3APSTI%7C3m&Prima%20BioMed=NASDAQ%3APBMD%7C3m&ProQR%20Thera=NASDAQ%3APRQR%7C3m&Reata%20Pharma=NASDAQ%3ARETA%7C3m&Redhill%20Biopharma=NASDAQ%3ARDHL%7C3m&Regulus%20Thera=NASDAQ%3ARGLS%7C3m&Rigel%20Pharma=NASDAQ%3ARIGL%7C3m&Sangamo%20Bio=NASDAQ%3ASGMO%7C3m&Sophiris%20Bio=NASDAQ%3ASPHS%7C3m&Spring%20Bank=NASDAQ%3ASBPH%7C3m&Stemline%20Thera=NASDAQ%3ASTML%7C3m&Strongbridge%20Bio=NASDAQ%3ASBBP%7C3m&Summit%20Thera=NASDAQ%3ASMMT%7C3m&Sunesis%20Pharma=NASDAQ%3ASNSS%7C3m&Syndax%20Pharma=NASDAQ%3ASNDX%7C3m&Synthetic%20Biologics=AMEX%3ASYN%7C3m&Tenax%20Thera=NASDAQ%3ATENX%7C3m&TG%20Therapeutics=NASDAQ%3ATGTX%7C3m&Titan%20Pharma=NASDAQ%3ATTNP%7C3m&Tracon%20Pharma=NASDAQ%3ATCON%7C3m&Trevena=NASDAQ%3ATRVN%7C3m&Trillium%20Thera=NASDAQ%3ATRIL%7C3m&uniQure=NASDAQ%3AQURE%7C3m&Vascular%20Biogenics=NASDAQ%3AVBLT%7C3m&VBI%20Vaccines=NASDAQ%3AVBIV%7C3m&Vericel%20Corp=NASDAQ%3AVCEL%7C3m&Versartis=NASDAQ%3AVSAR%7C3m&Vital%20Therapies=NASDAQ%3AVTL%7C3m&VIVUS=NASDAQ%3AVVUS%7C3m&Xenon%20Pharma=NASDAQ%3AXENE%7C3m&Zafgen=NASDAQ%3AZFGN%7C3m&Zynerba%20Pharma=NASDAQ%3AZYNE%7C3m&locale=en&activeTickerBackgroundColor=%23EDF0F3&trendLineColor=%234bafe9&underLineColor=%23dbeffb&fontColor=%2383888D&gridLineColor=%23E9E9EA&large_chart_url=https%3A%2F%2Fwww.tradingview.com%2Fchart%2F&width=310px&height=4400px&locale=en&utm_source=investsnips.com&utm_medium=widget&utm_campaign=market-overview" width="310px" height="4400px" frameborder="0" allowtransparency="true" scrolling="no" style="margin: 0 !important; padding: 0 !important;"></iframe>
Not sure what I seem to be doing wrong.
Posts: 7,320
Threads: 123
Joined: Sep 2016
I can't id you are trying to switch to.
Try:
time.sleep(4) # let site load
driver.switch_to.frame(tradingview_03881)
# Or
driver.switch_to.frame(tradingview_0e8ff)
Posts: 39
Threads: 11
Joined: Jan 2017
Apr-30-2017, 08:02 PM
(This post was last modified: Apr-30-2017, 08:02 PM by bigmit37.)
(Apr-30-2017, 07:37 PM)snippsat Wrote: I can't id you are trying to switch to.
Try:
time.sleep(4) # let site load
driver.switch_to.frame(tradingview_03881)
# Or
driver.switch_to.frame(tradingview_0e8ff)
Can't seem to connect to those as well.
oSuchFrameException: Message: tradingview_0e8ff
I even have time.sleep(20).
driver.switch_to.frame('tradingview_03881') #I added quotes around them in my code, as they are strings.
It seems the ID changes each time we engage with the website. I wonder if I can use RE to deal with that.
Posts: 7,320
Threads: 123
Joined: Sep 2016
Apr-30-2017, 08:53 PM
(This post was last modified: Apr-30-2017, 08:53 PM by snippsat.)
I see that id name change on reload.
tradingview is same every time.
You can try:
driver.switch_to_frame(driver.find_element_by_partial_link_text("tradingview"))
Posts: 39
Threads: 11
Joined: Jan 2017
(Apr-30-2017, 08:53 PM)snippsat Wrote: I see that id name change on reload.
tradingview is same every time.
You can try:
driver.switch_to_frame(driver.find_element_by_partial_link_text("tradingview"))
Okay, I actually got to the next step using the xpath which I copied via firebug and I was able to find all the elements I was interested in.
driver.find_element_by_xpath('/html/body/div[1]/div/div/article/div/div[2]/div/div[2]/div[1]/div/div/div[1]/iframe')
Thank you so much for your help.
|