I get one free ebook a day from Packt Publishing with their "Free Learning - Free Technology Ebooks" promo. I'm trying to automate this process. I do a POST against their root path to login, after that I do a GET on the promo URL and use BeautifulSoup 4 to get the HREF of the "claim your free ebook" link, and now I'm stuck. Here's the code:
#!/usr/bin/env python # -*- coding: utf-8 -*- import requests from bs4 import BeautifulSoup USERNAME = '[email protected]' PASSWORD = 'secret' BASE_URL = 'https://www.packtpub.com' PROMO_URL = 'https://www.packtpub.com/packt/offers/free-learning' session = requests.session() headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2810.1 Safari/537.36'} session.post(BASE_URL, {"username": USERNAME, "password": PASSWORD}, headers=headers) response = session.get(PROMO_URL, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') current_offer_href = BASE_URL + soup.find("div", {"class": "free-ebook"}).a['href'] print(current_offer_href) print(session.get(current_offer_href, headers=headers))The
current_offer_href
is holding the correct value, if you go to the site today (13/NOV/2016) and inspect the button you will find it. In this case, it's holding https://www.packtpub.com/freelearning-claim/17276/21478
. If I try to do a GET against current_offer_href
I receive <Response [404]>
. In reality what I should be getting is a redirect to https://www.packtpub.com/account/my-ebooks
, because that's what happen if I click the button manually on the site. What's wrong here?