Aug-04-2019, 09:36 AM
To do a test,also see that this code can be run then it easier for people to help.
from bs4 import BeautifulSoup import re html = '''\ <span class="annonce_get_description" itemprop="description"> Smartphones<br> <b>Double puces</b> <br> Mémoire : 64 GO <br> Bluetooth Wifi <b>4G</b> <br> Ecran 5.8 pouces <br> Appareil photo 12 MP <br> Bon état <br> <span class="annonce_description_preview "> </span></span>''' soup = BeautifulSoup(html, 'lxml')Use:
>>> print(tags.text) tags = soup.find(class_="annonce_get_description") Smartphones Double puces Mémoire : 64 GO Bluetooth Wifi 4G Ecran 5.8 pouces Appareil photo 12 MP Bon état >>> print(repr(tags.text.strip())) 'Smartphones\nDouble puces\n\nMémoire : 64 GO\n\nBluetooth Wifi 4G\n\nEcran 5.8 pouces\n\nAppareil photo 12 MP\n\nBon état'With
.text
get all br tags,see when use repr()
that if split on \n\n
it should keep the structure.>>> br_tags = tags.text.strip().split('\n\n') >>> br_tags ['Smartphones\nDouble puces', 'Mémoire : 64 GO', 'Bluetooth Wifi 4G', 'Ecran 5.8 pouces', 'Appareil photo 12 MP', 'Bon état'] >>> print(br_tags[0]) Smartphones Double puces >>> print(br_tags[2]) Bluetooth Wifi 4G