(Aug-19-2017, 08:30 PM)Fran_3 Wrote: 1 - If I'm using bs to capture the contents of a pre tag... then that is a hierarchical bs object... or some such... right? And as such I can't use regx to search it... right?Bye using
text
call then it's just a string that can be used bye Python string tool or regex.(Aug-19-2017, 08:30 PM)Fran_3 Wrote: 2 - Your earlier code in this thread seems to be a valid solution for dealing with \n issue when bs 'finds' pre tag contents... right?There is no
\n
issue,if only one line there is no \n
.Multiple lines they are separated bye
\n
,just like all multiple text lines in Python.Fran_3 Wrote:3 - But since I invested a bunch of time in learning regx it would be nice to know that when bs does not provide an obvious (to me) way to drill down and get my target text... how do I convert the thing bs returns via using the find, find_all or select method to a string upon which a regx search will work?Often the way HTML/XML is structured there is no need to further search with regex.
If need to search more specific as mention before you call
text
(and use tool on that text).Example this is a typical way with text and values are in separated tags.
from bs4 import BeautifulSoup html = '''\ <html> <head> <meta charset="UTF-8"> <title>Title of the document</title> </head> <body> <p id="calc_text">Calculation Results is</p> <span class="BMIScore">158</span> </body> </html> ''' soup = BeautifulSoup(html, 'lxml')Use it:
>> p = soup.find('p') >>> p <p id="calc_text">Calculation Results is</p> >>> type(p) <class 'bs4.element.Tag'> >>> # Calling text take it out of BS to a string >>> p = soup.find('p').text >>> p 'Calculation Results is' >>> type(p) <class 'str'>The value is in separated tag.
>>> bmi = soup.select_one('.BMIScore') >>> bmi <span class="BMIScore">158</span> >>> bmi.text '158' # Or if integer is needed >>> int(bmi.text) 158