Python Forum

Full Version: working with lxml and requests
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Hello. I am trying to pull xml data from a webpage, grab only two XML tags or data pieces I need and then put them in an CSV file. The webpage is a Cisco Call Manager Unity Voicemail server. I want to pull a users "Alias" and their phone extension "DtmfAccessId". Each webpage shows up to 2000 users and eventually id like to build a while statement so it goes thru each page till there are no more.

Below is all the code I've gotten so far. I can't even see the xml data. I'm a beginner in Python so be patient.

from lxml import html
import requests

page = 1

response = requests.get('\&pageNumber=' + page, verify=False, auth=('user', 'pass'))

xml = response.content

data = html.document_fromstring(xml)

Any help is greatly appreciated.
If it's an html page, then using beautifulsoup would probably be the easiest way. If you can identify what the tags are (a class, or how they're nested or something), then you can use one line to get all the users as a list.
Thats the issue I'm also having. How can I get the XML code of the page I'm trying to access? I cant just use a web browser and go to that link I gave you. If I do, I get an error 405 (method not supported). So I believe I have to use some sort of GET method. I just don't know where to look or find what I am looking for. Anything I've found online is just a dead end. Been at this for 3 weeks and I'm running out of steam. haha
Quote:Thats the issue I'm also having. How can I get the XML code of the page I'm trying to access?
as nilamo points out, it's most likely html and not XML.
you should be using beautiful soup.
There's an excellent 2-part tutorial on this forum by snippsat, on scraping with beautifulsoup
part1 here:
part2 here:
Ok. I will give it a try! Thanks you guys! Appreciate it!
If it doesn't support GET, then you can use something else to craft a different type of request. On Windows, Fiddler is very good. Otherwise, curl is very good.
So after looking over your links here is what I have now...

import requests
from bs4 import BeautifulSoup

response = requests.get('\&pageNumber=1', verify=False, auth=('user', 'pass'))

soup = BeautifulSoup(response.content, 'html.parser')
and I get the following output...

Warning (from warnings module): File "/usr/lib/python3/dist-packages/urllib3/", line 845 InsecureRequestWarning) InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: Cisco System - Error report
It does support GET. There is an old perl script that uses GET to do the same thing, however, the old script doesnt work on a new Debian server and none of us know Perl, just basic Python for scripting access to routers and switches. The perl script uses the following modules...

The code in perl looks like this...

use LWP::Simple;
use XML::Simple;

my $xml = new XML::Simple;
my @userdata;

$page = 1;

  my $url = "https://USER:pass\@\&pageNumber=$page";
  my $content = get($url);
  die "error getting $url" unless defined $content;

  my $data = $xml->XMLin($content);

# if we dont get at least one user end loop

  if(@{$data->{User}} < 1)
# build the userdata array, each entry contains "username,extension"

$start =(($page-1) * 2000);
for ($i=$start;$i<=$start + @{$data->{User}} - 1;$i++)

# Dump the results to a file

 print UNITY "$_\n";
(Apr-18-2018, 06:16 PM)gentoobob Wrote: [ -> ]my $url = "https://USER:pass\@
That's different from the url you're using. Maybe the switch doesn't understand authentication headers, and it needs to be sent as part of the url?

Also, back slashes (this: \) are almost never used in urls, so my guess is that they're in that perl string to prevent perl from parsing the string somehow. So maybe this: ?rowsPerPage=2000\&pageNumber=1 should be this: ?rowsPerPage=2000&pageNumber=1
Well that’s the perl version of what needs to be done. That’s just an example. I’m trying to do that in python.
A url is a url. They're not different in perl or python. So changing the url could be why you're getting different results.
Pages: 1 2 3