Python Forum

Full Version: Session Persistence
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I'm trying to scrap a remote batch server's jobs from it's XML webpage.

Running the code line by line in the interpreter fails with traceback at
tree = ET.parse(r.content)
It appears the second call (get) has not been able to use the session cookie as per the traceback (Please log in to THE Batch Server with valid user...)

Can anyone see what I am doing wrong? TIA

This is my code:
import requests
import requests.packages.urllib3
from lxml import html
from lxml import etree
import xml.etree.ElementTree as ET

requests.packages.urllib3.disable_warnings()

# create a session
s = requests.Session()

# make a login POST request, using the session
s.post("https://server01/login.jsp", data=dict(username="UserA", password="PasswordA"), verify=False)

# subsequent requests that use the session will automatically handle cookies
r = s.get("https://serverA/admin?action=xmlStatus", cookies=s.cookies)

# if [print(s.cookies)] a JSESSIONID is returned

# print Batch Jobs
tree = ET.parse(r.content)
batch_jobs = tree.xpath("//div[@id='collapsible2']/div[1]/div[2]/div[1]/span[2]/text()")
print (batch_jobs)
This is the Traceback I get:
Error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.6/xml/etree/ElementTree.py", line 1196, in parse tree.parse(source, parser) File "/usr/lib64/python3.6/xml/etree/ElementTree.py", line 586, in parse source = open(source, "rb") FileNotFoundError: [Errno 2] No such file or directory: b'\r\n\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\r\n<html>\r\n <head>\r\n\t<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r\n <title>THE Batch Server</title>\r\n <link rel="stylesheet" type="text/css" href="css/page.css"/>\r\n </head>\r\n <body>\r\n <h1>&nbsp;<img height="22" width="22" src="images/batchdefault.png"/>&nbsp;&nbsp;THE Batch Server Login</h1>\r\n \r\n \r\n Please log in to THE Batch Server with valid user which has access to use case \'Scheduled Jobs\'.\r\n \r\n <p/>\r\n <form method="post" action="login">\r\n <table border="0">\r\n <tbody>\r\n <tr>\r\n <td>User ID</td>\r\n <td>\r\n \r\n <input type="text" name="username">\r\n \r\n </td>\r\n </tr>\r\n <tr>\r\n <td>Password</td>\r\n <td><input type="password" name="password"></td>\r\n </tr>\r\n </tbody>\r\n </table>\r\n <input type="submit" value="Submit">\r\n </form>\r\n </body>\r\n</html>'
ElementTree.parse() accepts a file object only. The data passed in is a byte string, not a file. Try ElementTree.fromstring() instead. You'll likely need to decode the data as well.
But that doesnt explain why the traceback highlighted that there was no persistent session (Please log in to THE Batch Server with valid user)
(Jan-15-2019, 04:31 PM)portuguesedanny Wrote: [ -> ]Please log in to THE Batch Server with valid user which has access to use case \'Scheduled Jobs\'
Are you maybe logging in with a user that doesn't have access to do what you're trying to do?
The traceback states "FileNotFoundError: [Errno 2] No such file or directory" and then lists a byte string that looks identical to a XML file:

Quote:b'\r\n\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\r\n<html>\r\n <head>\r\n\t<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r\n <title>THE Batch Server</title>\r\n <link rel="stylesheet" type="text/css" href="css/page.css"/>\r\n </head>\r\n <body>\r\n <h1>&nbsp;<img height="22" width="22" src="images/batchdefault.png"/>&nbsp;&nbsp;THE Batch Server Login</h1>\r\n \r\n \r\n Please log in to THE Batch Server with valid user which has access to use case \'Scheduled Jobs\'.\r\n \r\n <p/>\r\n <form method="post" action="login">\r\n <table border="0">\r\n <tbody>\r\n <tr>\r\n <td>User ID</td>\r\n <td>\r\n \r\n <input type="text" name="username">\r\n \r\n </td>\r\n </tr>\r\n <tr>\r\n <td>Password</td>\r\n <td><input type="password" name="password"></td>\r\n </tr>\r\n </tbody>\r\n </table>\r\n <input type="submit" value="Submit">\r\n </form>\r\n </body>\r\n</html>'

The core problem is that the expected filepath is a XML-formatted byte string.



Nevermind. I see what you're on about now. The documentation for requests.Session indicate that the username and password need to be set to the object, not just passed into the post() method:

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

# both 'x-test' and 'x-test2' are sent
s.get('https://httpbin.org/headers', headers={'x-test2': 'true'})
Try setting s.auth with the username and password.