Python Forum

Full Version: [SOLVED] [ElementTree] Grab text in attributes?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I'm not very good at XPath, and am a bit lost at the syntax to 1) find an element based on the value of its first attribute, and grab the text of the second attribute in an HTML file:

<meta name="description" content="Blah"/>
<meta name="keywords" content="blah"/>
<meta name="classification" content="other"/>

description = root.find('./head/meta[@description]')
print(description.text)
Thank you.

--
Edit: Getting closer

description=root.xpath("//meta[@name='description' and @content]")
#BAD print(description.text) #'list' object has no attribute 'text'
you didn't say what method you arte using to find description
with selenium, use:
description = browser.find_element(by=By.XPATH, value="//meta[@name='description']").text
note you may have to replace browser with 'driver' or whatever you opened selenium with.
I would not use ElementTree for parsing html,look at Web-Scraping part-1
from bs4 import BeautifulSoup

data = '''\
<html>
  <meta name="description" content="Blah"/>
  <meta name="keywords" content="blah"/>
  <meta name="classification" content="other"/>
<html>'''

soup = BeautifulSoup(data, 'lxml')
tag = soup.find('meta', {'name': 'keywords'})
>>> tag
<meta content="blah" name="keywords"/>
>>> tag.attrs
{'content': 'blah', 'name': 'keywords'}
>>> tag.attrs.get('content')
'blah'
If want to use XPath i would use lxml.
from lxml import etree

data = '''\
<html>
  <meta name="description" content="Blah"/>
  <meta name="keywords" content="blah"/>
  <meta name="classification" content="other"/>
</html>'''

tree = etree.fromstring(data)
tag = tree.xpath("//meta[@name='classification']/@content")
print(tag[0])
Output:
other
Thanks much. I forgot to say I'm actually using lxml, and the code above solved it.