Python Forum

Full Version: scraping from a website that hides source code
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello Pythoners,

I want to make python get some data from this website: https://tritrypdb.org/


The idea is to then implement that into excel, so I can have it search for the same piece of information for hundreds of genes, and put it in my excel sheet.


It is now searching for a gene on the website (I just have it add the name to the URL), and it reads the source code.
import urllib.request
import re
#this is just an example for the gene ID. It will be a list of gene IDs.
geneID="Tb927.7.2390"
#open the webpage, directly going to the right gene ID
page=urllib.request.urlopen("https://tritrypdb.org/tritrypdb/app/record/gene/"+geneID)
#read entire source code
scode=page.read()
The plan was now to search the source code for the information needed, and return that. But It seems that the source code just doesn't contain any of the actual text which is there with the normal graphic view of the browser. Instead there are huge blank spaces.


Is this webpage somehow hiding that information? and is there a way to still get the information out of there?

Thank you for your help people!
You need to learn a bit more about scraping.
There's a good two part tutorial on this forum.
see:
web scraping part 1
https://python-forum.io/Thread-Web-scraping-part-2