Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 scraping from a website that hides source code
#1
Hello Pythoners,

I want to make python get some data from this website: https://tritrypdb.org/


The idea is to then implement that into excel, so I can have it search for the same piece of information for hundreds of genes, and put it in my excel sheet.


It is now searching for a gene on the website (I just have it add the name to the URL), and it reads the source code.
import urllib.request
import re
#this is just an example for the gene ID. It will be a list of gene IDs.
geneID="Tb927.7.2390"
#open the webpage, directly going to the right gene ID
page=urllib.request.urlopen("https://tritrypdb.org/tritrypdb/app/record/gene/"+geneID)
#read entire source code
scode=page.read()
The plan was now to search the source code for the information needed, and return that. But It seems that the source code just doesn't contain any of the actual text which is there with the normal graphic view of the browser. Instead there are huge blank spaces.


Is this webpage somehow hiding that information? and is there a way to still get the information out of there?

Thank you for your help people!
Larz60+ wrote Mar-27-2020, 05:06 PM:
Please post all code, output and errors (in it's entirety) between their respective tags. I did it for you this time, Here are instructions on how to do it yourself next time.
Quote
#2
You need to learn a bit more about scraping.
There's a good two part tutorial on this forum.
see:
web scraping part 1
https://python-forum.io/Thread-Web-scraping-part-2
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Scraping not moving to the next pages in a website jithin123 0 70 Mar-23-2020, 06:10 PM
Last Post: jithin123
  Random Loss of Control of Website When Scraping bmccollum 0 221 Aug-30-2019, 04:04 AM
Last Post: bmccollum
  MaxRetryError while scraping a website multiple times kawasso 6 3,542 Aug-29-2019, 05:25 PM
Last Post: kawasso
  Python requests.get() returns broken source code instead of expected source code? FatalPythonError 3 1,093 Sep-21-2018, 02:46 PM
Last Post: nilamo
  scraping multiple pages of a website. Blue Dog 14 13,770 Jun-21-2018, 09:03 PM
Last Post: Blue Dog
  Scraping number in % from website santax 3 2,120 Mar-19-2017, 12:22 PM
Last Post: santax
  web crawler that retrieves data not stored in source code edithegodfather 14 5,018 Jan-14-2017, 01:01 AM
Last Post: edithegodfather

Forum Jump:


Users browsing this thread: 1 Guest(s)