strange characters after printing soup

zarize · Jan-27-2020, 10:22 AM

Hi guys,

Have you ever met with something like i have right now?

I just wanted to print soup and wow.. is there any way to do something with it or it is not scrappable?

Elements/code is written normally with divs, classes in english, text is in english

[Image: SEgxLdI.png]

**Larz60+** · Jan-27-2020, 11:59 AM

Please show the code you are using, this looks like a binary dump.

zarize · (This post was last modified: Jan-27-2020, 12:28 PM by zarize.)

(Jan-27-2020, 11:59 AM)Larz60+ Wrote: Please show the code you are using, this looks like a binary dump.

from lxml import html
from bs4 import BeautifulSoup
import requests
import pandas as pd
import json
import argparse
import os
import re
import xlrd

url = 'https://www.zameen.com/Rentals/Islamabad_Green_Avenue-8566-1.html'

headers        = {
                    'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
                    'accept-encoding':'gzip, deflate, sdch, br',
                    'accept-language':'en-GB,en;q=0.8,en-US;q=0.6,ml;q=0.4',
                    'cache-control':'max-age=0',
                    'upgrade-insecure-requests':'1',
                    'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
        }
response = requests.get(url, headers=headers)
parser = response.text
soup = BeautifulSoup(parser, "html.parser")
print(soup.text)

***snippsat*** · (This post was last modified: Jan-27-2020, 06:32 PM by snippsat.)

The site us JavaScript heavily,also languages detection.
Use Selenium,here a test.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
#options.add_argument("--window-size=1980,1020")
#options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('https://www.zameen.com/Rentals/Islamabad_Green_Avenue-8566-1.html')
time.sleep(2)
soup = BeautifulSoup(browser.page_source, 'lxml')
# Example of using both Selenium and BS(bye giving page_source) 
use_bs4 = soup.find('span', class_="f343d9ce")
use_sel = browser.find_elements_by_xpath('//*[@id="body-wrapper"]/main/div[2]/div[2]/div[4]/div[1]/ul/li[1]/article/div[3]/div[1]/div/div[1]/div/span[2]')
print(use_bs4.text)
print(use_sel[0].text)

Output:95 Thousand
95 Thousand

zarize · (This post was last modified: Jan-28-2020, 08:58 AM by zarize.)

Thank you!!
Awesome to learn something new :)

I ve done like 20 scripts already and still i can meet something new :D

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	not printing soup data to csv	rickyrt	2	2,088	Aug-31-2021, 03:41 PM Last Post: rickyrt

strange characters after printing soup

User Panel Messages

Announcements