Python Forum

Full Version: working with lxml and requests
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
As mention bye @nilamo try login as part of url and without \.
In perl \@10.10.10.1 is a reference to @10.10.10.1
>>> password = 'bar'
>>> user = 'foo'
>>> page = 1
>>> url = f"https://{user}:{password}@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber=${page}"
>>> print(url)
https://foo:[email protected]/vmrest/users?rowsPerPage=2000&pageNumber=$1
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
user = 'foo'
password = 'bar'
page = 1
url = f"https://{user}:{password}\@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber=${page}"
response = requests.get(url, verify=False)
xml = response.content
print(xml)
Quote:
&pageNumber=${page}

In perl, $ also has special meaning. That should just be:
url = f"https://{user}:{password}@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber={page}"
I'm really confused now. I tried using the Perl URL and it just errors out. I think that URL layout in the perl is how the LWP::UserAgent handles the username and password authenication. Its format.

I just found some examples with urllib3 and it seems to run a bit better and looks as if I need to do some kind of decoding.

I appreciate the help nilamo.
(Apr-18-2018, 08:15 PM)nilamo Wrote: [ -> ]In perl, $ also has special meaning. That should just be
Yes forget to remove $,thanks.
(Apr-18-2018, 08:21 PM)gentoobob Wrote: [ -> ]I just found some examples with urllib3 and it seems to run a bit better and looks as if I need to do some kind of decoding.
Requests is powered bye urllib3,so has those features and more.
Here is the corrected version,this is Python 3.6 as it has f-string.
For Python version before that use .format()
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
user = 'foo'
password = 'bar'
page = 1
url = f"https://{user}:{password}@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber={page}"
response = requests.get(url, verify=False)
xml = response.content
print(xml)
Thanks for the help but didn't work. It doesnt even output anything, it locks up my Python shell and I have to kill it.

ok...so this works. It gives me a spit out of XML data. Now I need to get that to do a While loop thru all the pages on the website and then I just need to only capture the tags that have "alias" and "dtmfaccessid" and have it specifiy one per line into a CSV file. For example.

janeDoe, 95433
bobDoe, 95444



import requests
import lxml
from bs4 import BeautifulSoup
url = 'https://10.10.10.0/vmrest/users?rowsPerPage=2000&pageNumber=1'
request_page = requests.get(url, verify=False, auth=('user', 'pass'))
soup = BeautifulSoup(request_page.text, 'lxml')
print soup
Thanks for the help guys!
(Apr-19-2018, 01:08 PM)gentoobob Wrote: [ -> ]ok...so this works. It gives me a spit out of XML data
Good,one tips use content,the will BeautifulSoup make to Unicode as it always do with all HTML/XML that's it take in
soup = BeautifulSoup(request_page.content, 'lxml')
request_page.text is the content of the response in Unicode.
request_page.content is the content of the response in bytes.
So no point to convert to Unicode 2 times.
So this is what the XML looks like when it spits it out except its a a lot of users...5820 users on one page and I need to create a "while" loop that will go thru every page. The only two tags I need are <alias> and <dtmfaccessid>

Output:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <html> <body> <users total="5820"> <user> <firstname>Jane</firstname> <lastname>Doe</lastname> <alias>jDoe</alias> <city></city> <department>Accounting</department> <employeeid></employeeid> <displayname>Jane Doe</displayname> <emailaddress>[email protected]</emailaddress> <timezone>40</timezone> <creationtime>2015-10-23T16:38:23Z</creationtime> <listindirectory>true</listindirectory> <isvmenrolled>false</isvmenrolled> <dtmfaccessid>14734</dtmfaccessid> <voicenamerequired>false</voicenamerequired> </user> </users> </body> </html>

(Apr-19-2018, 01:36 PM)snippsat Wrote: [ -> ]
(Apr-19-2018, 01:08 PM)gentoobob Wrote: [ -> ]ok...so this works. It gives me a spit out of XML data
Good,one tips use content,the will BeautifulSoup make to Unicode as it always do with all HTML/XML that's it take in
soup = BeautifulSoup(request_page.content, 'lxml')
request_page.text is the content of the response in Unicode.
request_page.content is the content of the response in bytes.
So no point to convert to Unicode 2 times.


Ok. I will look at that. I appreciate it!

so I got the filter of XML tags sorted out with the following code...

alias = soup.find_all('alias')

dtmfaccessid = soup.find_all('dtmfaccessid')

for i in range(0, len(alias)):
    print(alias[i].get_text(), end=' ')
    print(dtmfaccessid[i].get_text())
spits me out the two columns I need. Now just to do a while loop so it gets all 2000 users on each page and then puts this list into a CSV file with a time stamp.
(Apr-19-2018, 01:39 PM)gentoobob Wrote: [ -> ]spits me out the two columns I need. Now just to do a while loop
Why do you need a while loop? Don't you already have the data you need?
Because the url at the end has a page number. I need a loop that starts at page one then goes to page two, three, etc until no more pages are left. There is 5000 users total but i can only see 2000 users per page.
Pages: 1 2 3