Python Forum
strange characters after printing soup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
strange characters after printing soup
#1
Hi guys,

Have you ever met with something like i have right now?

I just wanted to print soup and wow.. is there any way to do something with it or it is not scrappable?

Elements/code is written normally with divs, classes in english, text is in english

[Image: SEgxLdI.png]
Reply
#2
Please show the code you are using, this looks like a binary dump.
Reply
#3
(Jan-27-2020, 11:59 AM)Larz60+ Wrote: Please show the code you are using, this looks like a binary dump.

from lxml import html
from bs4 import BeautifulSoup
import requests
import pandas as pd
import json
import argparse
import os
import re
import xlrd

url = 'https://www.zameen.com/Rentals/Islamabad_Green_Avenue-8566-1.html'

headers        = {
                    'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
                    'accept-encoding':'gzip, deflate, sdch, br',
                    'accept-language':'en-GB,en;q=0.8,en-US;q=0.6,ml;q=0.4',
                    'cache-control':'max-age=0',
                    'upgrade-insecure-requests':'1',
                    'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
        }
response = requests.get(url, headers=headers)
parser = response.text
soup = BeautifulSoup(parser, "html.parser")
print(soup.text)
Reply
#4
The site us JavaScript heavily,also languages detection.
Use Selenium,here a test.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
#options.add_argument("--window-size=1980,1020")
#options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('https://www.zameen.com/Rentals/Islamabad_Green_Avenue-8566-1.html')
time.sleep(2)
soup = BeautifulSoup(browser.page_source, 'lxml')
# Example of using both Selenium and BS(bye giving page_source) 
use_bs4 = soup.find('span', class_="f343d9ce")
use_sel = browser.find_elements_by_xpath('//*[@id="body-wrapper"]/main/div[2]/div[2]/div[4]/div[1]/ul/li[1]/article/div[3]/div[1]/div/div[1]/div/span[2]')
print(use_bs4.text)
print(use_sel[0].text)
Output:
95 Thousand 95 Thousand
Reply
#5
Thank you!!
Awesome to learn something new :)

I ve done like 20 scripts already and still i can meet something new :D
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  not printing soup data to csv rickyrt 2 2,088 Aug-31-2021, 03:41 PM
Last Post: rickyrt

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020