Aug-12-2024, 05:53 PM
I'm making a Python script to extract data from some APIs, such as TCGA API, in order to extract it and import it into my SQL database, which tracks types of cancers along with the cancer patients' data (i.e. age, gender, etnicity, type of cancer ecc.) .
The problem relies within the extraction of the data from all of the pages of the API. For example, I've managed to extract data from only one page. Does anyone have some piece of advice to optimize my code? It would be so much helpful.
Here's a piece of my code:
The problem relies within the extraction of the data from all of the pages of the API. For example, I've managed to extract data from only one page. Does anyone have some piece of advice to optimize my code? It would be so much helpful.
Here's a piece of my code:
import requests import json import re file_id = "b658d635-258a-4f6f-8377-767a43771fe4" data_endpt = "https://api.gdc.cancer.gov/data/{}".format(file_id) response = requests.get(data_endpt, headers = {"Content-Type": "application/json"}) response_head_cd = response.headers["Content-Disposition"] file_name = re.findall("filename=(.+)", response_head_cd)[0] with open(file_name, "wb") as output_file: output_file.write(response.content)