Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
working with lxml and requests
#11
As mention bye @nilamo try login as part of url and without \.
In perl \@10.10.10.1 is a reference to @10.10.10.1
>>> password = 'bar'
>>> user = 'foo'
>>> page = 1
>>> url = f"https://{user}:{password}@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber=${page}"
>>> print(url)
https://foo:[email protected]/vmrest/users?rowsPerPage=2000&pageNumber=$1
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
user = 'foo'
password = 'bar'
page = 1
url = f"https://{user}:{password}\@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber=${page}"
response = requests.get(url, verify=False)
xml = response.content
print(xml)
Reply
#12
Quote:
&pageNumber=${page}

In perl, $ also has special meaning. That should just be:
url = f"https://{user}:{password}@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber={page}"
Reply
#13
I'm really confused now. I tried using the Perl URL and it just errors out. I think that URL layout in the perl is how the LWP::UserAgent handles the username and password authenication. Its format.

I just found some examples with urllib3 and it seems to run a bit better and looks as if I need to do some kind of decoding.

I appreciate the help nilamo.
Reply
#14
(Apr-18-2018, 08:15 PM)nilamo Wrote: In perl, $ also has special meaning. That should just be
Yes forget to remove $,thanks.
Reply
#15
(Apr-18-2018, 08:21 PM)gentoobob Wrote: I just found some examples with urllib3 and it seems to run a bit better and looks as if I need to do some kind of decoding.
Requests is powered bye urllib3,so has those features and more.
Here is the corrected version,this is Python 3.6 as it has f-string.
For Python version before that use .format()
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
user = 'foo'
password = 'bar'
page = 1
url = f"https://{user}:{password}@10.10.10.1/vmrest/users?rowsPerPage=2000&pageNumber={page}"
response = requests.get(url, verify=False)
xml = response.content
print(xml)
Reply
#16
Thanks for the help but didn't work. It doesnt even output anything, it locks up my Python shell and I have to kill it.

ok...so this works. It gives me a spit out of XML data. Now I need to get that to do a While loop thru all the pages on the website and then I just need to only capture the tags that have "alias" and "dtmfaccessid" and have it specifiy one per line into a CSV file. For example.

janeDoe, 95433
bobDoe, 95444



import requests
import lxml
from bs4 import BeautifulSoup
url = 'https://10.10.10.0/vmrest/users?rowsPerPage=2000&pageNumber=1'
request_page = requests.get(url, verify=False, auth=('user', 'pass'))
soup = BeautifulSoup(request_page.text, 'lxml')
print soup
Thanks for the help guys!
Reply
#17
(Apr-19-2018, 01:08 PM)gentoobob Wrote: ok...so this works. It gives me a spit out of XML data
Good,one tips use content,the will BeautifulSoup make to Unicode as it always do with all HTML/XML that's it take in
soup = BeautifulSoup(request_page.content, 'lxml')
request_page.text is the content of the response in Unicode.
request_page.content is the content of the response in bytes.
So no point to convert to Unicode 2 times.
Reply
#18
So this is what the XML looks like when it spits it out except its a a lot of users...5820 users on one page and I need to create a "while" loop that will go thru every page. The only two tags I need are <alias> and <dtmfaccessid>

Output:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <html> <body> <users total="5820"> <user> <firstname>Jane</firstname> <lastname>Doe</lastname> <alias>jDoe</alias> <city></city> <department>Accounting</department> <employeeid></employeeid> <displayname>Jane Doe</displayname> <emailaddress>[email protected]</emailaddress> <timezone>40</timezone> <creationtime>2015-10-23T16:38:23Z</creationtime> <listindirectory>true</listindirectory> <isvmenrolled>false</isvmenrolled> <dtmfaccessid>14734</dtmfaccessid> <voicenamerequired>false</voicenamerequired> </user> </users> </body> </html>

(Apr-19-2018, 01:36 PM)snippsat Wrote:
(Apr-19-2018, 01:08 PM)gentoobob Wrote: ok...so this works. It gives me a spit out of XML data
Good,one tips use content,the will BeautifulSoup make to Unicode as it always do with all HTML/XML that's it take in
soup = BeautifulSoup(request_page.content, 'lxml')
request_page.text is the content of the response in Unicode.
request_page.content is the content of the response in bytes.
So no point to convert to Unicode 2 times.


Ok. I will look at that. I appreciate it!

so I got the filter of XML tags sorted out with the following code...

alias = soup.find_all('alias')

dtmfaccessid = soup.find_all('dtmfaccessid')

for i in range(0, len(alias)):
    print(alias[i].get_text(), end=' ')
    print(dtmfaccessid[i].get_text())
spits me out the two columns I need. Now just to do a while loop so it gets all 2000 users on each page and then puts this list into a CSV file with a time stamp.
Reply
#19
(Apr-19-2018, 01:39 PM)gentoobob Wrote: spits me out the two columns I need. Now just to do a while loop
Why do you need a while loop? Don't you already have the data you need?
Reply
#20
Because the url at the end has a page number. I need a loop that starts at page one then goes to page two, three, etc until no more pages are left. There is 5000 users total but i can only see 2000 users per page.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  POST requests - different requests return the same response Default_001 3 1,901 Mar-10-2022, 11:26 PM
Last Post: Default_001
  requests module is not working varsh 3 3,754 Sep-10-2020, 03:53 PM
Last Post: buran
  Flask, Posgresql - Multiple requests are not working bmaganti 5 2,675 Feb-20-2020, 03:02 PM
Last Post: bmaganti
  [Help]xpath is not working with lxml mr_byte31 3 6,159 Jul-22-2018, 04:10 PM
Last Post: stranac

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020