spliting html code with br tag - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: spliting html code with br tag (/thread-20301.html) Pages:
1
2
|
spliting html code with br tag - yokaso - Aug-04-2019 Hi,i am new in python scraping and i apologize for any mistake, i would like to get a text from html code and the target text is between <br>, i tried the following code but it give me the whole text. any idea ???? item_phone_type= items.find('span', class_='annonce_get_description', itemprop="description").text.split('<br>') print(item_phone_type) Quote:<span class="annonce_get_description" itemprop="description"> RE: spliting html code with br tag - snippsat - Aug-04-2019 To do a test,also see that this code can be run then it easier for people to help. from bs4 import BeautifulSoup import re html = '''\ <span class="annonce_get_description" itemprop="description"> Smartphones<br> <b>Double puces</b> <br> Mémoire : 64 GO <br> Bluetooth Wifi <b>4G</b> <br> Ecran 5.8 pouces <br> Appareil photo 12 MP <br> Bon état <br> <span class="annonce_description_preview "> </span></span>''' soup = BeautifulSoup(html, 'lxml')Use: >>> print(tags.text) tags = soup.find(class_="annonce_get_description") Smartphones Double puces Mémoire : 64 GO Bluetooth Wifi 4G Ecran 5.8 pouces Appareil photo 12 MP Bon état >>> print(repr(tags.text.strip())) 'Smartphones\nDouble puces\n\nMémoire : 64 GO\n\nBluetooth Wifi 4G\n\nEcran 5.8 pouces\n\nAppareil photo 12 MP\n\nBon état'With .text get all br tags,see when use repr() that if split on \n\n it should keep the structure.>>> br_tags = tags.text.strip().split('\n\n') >>> br_tags ['Smartphones\nDouble puces', 'Mémoire : 64 GO', 'Bluetooth Wifi 4G', 'Ecran 5.8 pouces', 'Appareil photo 12 MP', 'Bon état'] >>> print(br_tags[0]) Smartphones Double puces >>> print(br_tags[2]) Bluetooth Wifi 4G RE: spliting html code with br tag - yokaso - Aug-05-2019 i tried your code but it don't give me the same result as yours. Quote:IndexError Traceback (most recent call last) RE: spliting html code with br tag - snippsat - Aug-05-2019 You are getting output that my code can never give,so you are testing against more html then i do. Here as one script and take with itemprop="description that may be need if testing code on site you use.from bs4 import BeautifulSoup # Simulate code on a web-site html = '''\ <span class="annonce_get_description" itemprop="description"> Smartphones<br> <b>Double puces</b> <br> Mémoire : 64 GO <br> Bluetooth Wifi <b>4G</b> <br> Ecran 5.8 pouces <br> Appareil photo 12 MP <br> Bon état <br> <span class="annonce_description_preview "> </span></span>''' soup = BeautifulSoup(html, 'lxml') tags = soup.find(class_="annonce_get_description", itemprop="description") br_tags = tags.text.strip().split('\n\n') print(br_tags) print('-' * 15) print(br_tags[0]) print('-' * 15) print(br_tags[2])Just to make it clear code over is stand alone it do not need a url.
RE: spliting html code with br tag - yokaso - Aug-06-2019 thank you,and ur right . when i try ur code of html it work well and when i try the website it got errors. i don't know why but i want to know. i will send you the website maybe you can enlighten me ? https://www.ouedkniss.com/telephones and thank you again RE: spliting html code with br tag - snippsat - Aug-06-2019 Something like this, id="ann-20047560" is changes all time on this site.A general way to split as shown,may need to adjust some to get what you want as not all advertisement text are the same. import requests from bs4 import BeautifulSoup url = 'https://www.ouedkniss.com/telephones' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') id_find = soup.find('div', id="ann-20047560") text_tag = soup.find('span', class_="annonce_get_description", itemprop="description") print(text_tag.text) # split print('-' * 20) tag_spilt = text_tag.text.split('<br/>') lst = tag_spilt[0].split('\r\n') print(lst) RE: spliting html code with br tag - yokaso - Aug-06-2019 but this code don't split the content ???? RE: spliting html code with br tag - snippsat - Aug-06-2019 Output is now this,always changing. The lst is a list with split content.>>> lst[0] 'SmartphonesMémoire : 64 GO Bon état Je vends 2 iphone se produit européen :' >>> lst[1] '>> capacité : 64 go' >>> lst[2] '>> couleur : rose gold' >>> lst[3] '>> État : 10/10 ' RE: spliting html code with br tag - yokaso - Aug-06-2019 import requests from bs4 import BeautifulSoup url = 'https://www.ouedkniss.com/telephones' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') id_find = soup.find('div', id="ann-20047560") text_tag = soup.find('span', class_="annonce_get_description", itemprop="description") print(text_tag.text) # split print('-' * 20) tag_spilt = text_tag.text.split('<br/>') lst = tag_spilt[0].split('\r\n') print(lst) print('-'*20) print(lst[0]) Quote:SmartphonesBluetooth Wifi 4G Produit neuf jamais utilisé Paiement à la livraisonCaractéristiques de la série 4 (gps) i will try more again, and see if you see the picture we can split it from the <br> RE: spliting html code with br tag - snippsat - Aug-06-2019 (Aug-06-2019, 12:39 PM)yokaso Wrote: if you see the picture we can split it from the <br>Yes that's what i do look at code again,then Python add \r\n for new line which i do split on.Have to look at code that's get back with repr() ,can not only look at web-site code.
|