Python Forum
download pubmed PDFs using pubmed2pdf in python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
download pubmed PDFs using pubmed2pdf in python
#1
Hello,
Is there anyone can help me with these problems:
I want to download papers from Pubmed and save them as PDFs. Then 
1. A paper in PDF version with a Pubmed ID of 28019091 was successfully downloaded using pubmed2pdf package, and the code used was: 
python -m pubmed2pdf pdf --pmids="28019091"  (in Win 10 terminal), but the PDF file can't be opened. This file has a free PDF version on the web of Pubmed, and after downloading two more other papers using the same code, I also failed to open the corresponding PDFs. So what's wrong with my code?  The code utilized were copied from  https://pypi.org/project/pubmed2pdf/
2. How to download the chosen papers all together using pubmed2pdf package? I only find the pertinent code in the link above written as $ python3 -m pubmed2pdf pdf --pmidsfile="/my/path/to/the/file", but I don't know how to generate a file with batches of Pubmed IDs and have no idea of what types of the file should be created.
I'd be greatly appreciated if anyone could help me.
Reply
#2
For 1, can you give any more info? What happens when you run the command? Is there any output? What happens when you try to open the PDF?

For 2, you might just want to write a PowerShell script that iterates over the IDs and downloads them . I'm not a Windows user, but I'd do the equivalent on Linux and am assuming the shell on Windows has scripting capabilities.
Wooki likes this post
Reply
#3
Familiar with PubMed, but not with this app. How would it get the list of selected articles?
What is the size of the downloaded pdf?
Wooki likes this post
Reply
#4
Thumbs Up 
(Oct-10-2020, 11:31 AM)ndc85430 Wrote: For 1, can you give any more info? What happens when you run the command? Is there any output? What happens when you try to open the PDF?

Yaa, all went well when I ran the command. Here is the whole process:
C:\Users\lenovo>python -m pubmed2pdf pdf --pmids="28019091"
2020-10-12 08:08:46,450 - INFO - pubmed2pdf.utils - Trying to fetch pmid 28019091
Done downloading. All downloaded can be found in C:\Users\lenovo\pubmed2pdf


Then the PDF file was found in the default path, and when I clicked to open it in Acrobat Reader, a pop-up window appeared: Acrobat Reader failed to open "28019091.pdf" because this kind of file is not supported or the file is corrupted (eg. the file was sent as an appendix through email but not decoded correctly)". And under jefsummers's reminder, I notice that the size of the downloaded PDF was only 30kb, but the Pubmed directly-downloaded 28019091 was 408kb.

For 2, you might just want to write a PowerShell script that iterates over the IDs and downloads them . I'm not a Windows user, but I'd do the equivalent on Linux and am assuming the shell on Windows has scripting capabilities.

Sounds difficult for me but I'll try it. 
Reply
#5
Played with this some, including the code that it was cloned from. Pubmed has changed its interface recently. What you get in your 30K is html and javascript rather than a pdf, and it is not the article.

Sorry, this isn't the way to do it.

J
Reply
#6
(Oct-12-2020, 09:12 PM)jefsummers Wrote: Played with this some, including the code that it was cloned from. Pubmed has changed its interface recently. What you get in your 30K is html and javascript rather than a pdf, and it is not the article.

Sorry, this isn't the way to do it.

J Sad Sad Sad Thank you so much for your answer. By the way, how do you get those PDFs, can you share some methods?Thanks in advance. 
Reply
#7
I click on the link on the page for the article. Locate the article by search (or if you have it, by PMID) and if the full text is available there will be a link on the right side of the page.

For example, on the pubmed page do a search for Coronavirus Covid-19. Top article (today anyway) is "recent trends". Beside the PMID it says Free Aricle. Click the title of the article and you get the abstract and related articles. On the right side it says "Free Full Text". Click that and you get the full article as a webpage, and below the bottom of that is a link to get the PDF. OK, lot of steps, but it works for the articles that are available free.

A word of explanation for anyone else who may be reading this and is curious. The (US) National Institutes of Health runs the National Library of Medicine, which indexes the medical literature (not all of it, but pretty much all that is significant). Pubmed is a search engine designed to search that index. You can just word search or you can use special headings to limit the search, such as hydroxychloroquine with a MeSH (Medical SubHeading) of therapeutic use.
Wooki likes this post
Reply
#8
(Oct-13-2020, 04:05 PM)jefsummers Wrote: I click on the link on the page for the article. Locate the article by search (or if you have it, by PMID) and if the full text is available there will be a link on the right side of the page.

For example, on the pubmed page do a search for Coronavirus Covid-19. Top article (today anyway) is "recent trends". Beside the PMID it says Free Aricle. Click the title of the article and you get the abstract and related articles. On the right side it says "Free Full Text". Click that and you get the full article as a webpage, and below the bottom of that is a link to get the PDF. OK, lot of steps, but it works for the articles that are available free.

A word of explanation for anyone else who may be reading this and is curious. The (US) National Institutes of Health runs the National Library of Medicine, which indexes the medical literature (not all of it, but pretty much all that is significant). Pubmed is a search engine designed to search that index. You can just word search or you can use special headings to limit the search, such as hydroxychloroquine with a MeSH (Medical SubHeading) of therapeutic use.

Thank you jefsummers. Big Grin I know this method, but when there are bunches of articles to download, this method will be inefficient, so I want to download with Python.
Reply
#9
Since the publishers store them on their sites rather than being stored at Pubmed (NLM), you will need to webscrape the address then link to the publisher (Elsevier, for example).
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Comparing PDFs CaseCRS 5 1,144 Apr-01-2023, 05:46 AM
Last Post: DPaul
  python multiprocessing to download sql table mg24 5 1,407 Oct-31-2022, 03:53 PM
Last Post: Larz60+
  download with internet download manager coral_raha 0 2,880 Jul-18-2021, 03:11 PM
Last Post: coral_raha
  How can I download Python files from GitHub? bitcoin10mil 2 2,767 Aug-26-2020, 09:03 PM
Last Post: Axel_Erfurt
  How to compare two PDFs for differences Normanie 2 2,352 Jul-30-2020, 07:31 AM
Last Post: millpond
Big Grin python download manager with progressbar (not gui) ghostblade 1 1,892 Apr-23-2020, 11:05 AM
Last Post: snippsat
  Concatenate multiple PDFs using python gmehta1996 0 2,082 Mar-29-2020, 09:48 PM
Last Post: gmehta1996
  Python Download GillietheSquid 2 1,996 Mar-27-2020, 09:15 PM
Last Post: GillietheSquid
  Most optimized way to merge figures from multiple PDFs into one PDF page? dmm809 1 2,012 May-22-2019, 10:32 PM
Last Post: micseydel
  How to get python to download YouTube videos in the background? Pythenx 11 10,590 Mar-25-2019, 04:57 AM
Last Post: samsonite

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020