May-15-2020, 01:37 PM
Maybe there is better way if could have looked at source or maybe not.
Can take quick run on that string data,as i have clean up much worse stuff that this
Can take quick run on that string data,as i have clean up much worse stuff that this
>>> s = '\"text_with_blanks\":\"<b>Tagesmen\\u00fc im Restaurant<\\\/b><br\\\/>\\u00a0Samstag, 12. August<br\\\/>\\u00a0<b>Suppen<\\\/b><br\\\/>Tomatensuppe \\u00a0 \\u00a0' >>> ss = s.replace('\\u00a0', '').replace('\\\\', '').strip() >>> ss = d.replace('\\u00fc', '\u00fc') >>> print(ss) "text_with_blanks":"<b>Tagesmenü im Restaurant</b><br/>Samstag, 12. August<br/><b>Suppen</b><br/>Tomatensuppe # Now need a parser >>> from bs4 import BeautifulSoup >>> >>> soup = BeautifulSoup(ss, 'lxml') >>> print(soup.prettify()) <html> <body> <p> "text_with_blanks":" <b> Tagesmenü im Restaurant </b> <br/> Samstag, 12. August <br/> <b> Suppen </b> <br/> Tomatensuppe </p> </body> </html> >>> soup.select_one('p > b') <b>Tagesmenü im Restaurant</b> >>> print(soup.select_one('p > b').text) Tagesmenü im Restaurant