Python Forum
How to convert Python crawled Bing web page content to human readable?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to convert Python crawled Bing web page content to human readable?
#1
Hi, I'm playing with crawling Bing web search page using python3.
I find the raw content received looks like byte type though a bit weird than usual, but then my attempt to decompress the content has failed.
So now I have no idea what kind of data format is this content and what should I do to it.
Does someone have clue what kind of data is this, and how should I extract readable text information from this raw content? Thanks!

My code pasted below displays the raw content and then tries to do the gunzip, so you could see the raw content as well as error from the decompression.
Due to the raw content is too long, I just paste the first a few lines.

import urllib.request as Request
import gzip

req = Request.Request('www.bing.com')
req.add_header('upgrade-insecure-requests', 1)
res = Request.urlopen(req).read()
print("RAW Content: %s" %ResPage) # show raw content of web
print("Try decompression:")
print(gzip.decompress(ResPage))   # try decompression
Output:
RAW Content: b'+p\xe70\x0bi{)\xee!\xea\x88\x9c\xd4z\x00Tgb\x8c\x1b\xfa\xe3\xd7\x9f\x7f\x7f\x1d8\xb8\xfeaZ\xb6\xe3z\xbe\'\x7fj\xfd\xff+\x1f\xff\x1a\xbc\xc5N\x00\xab\x00\xa6l\xb2\xc5N\xb2\xdek\xb9V5\x02\t\xd0D \x1d\x92m%\x0c#\xb9>\xfbN\xd7\xa7\x9d\xa5\xa8\x926\xf0\xcc\'\x13\x97\x01/-\x03... ... Try decompression: Traceback (most recent call last): OSError: Not a gzipped file (b'+p') Process finished with exit code 1
Reply


Messages In This Thread
How to convert Python crawled Bing web page content to human readable? - by dalaludidu - Sep-02-2018, 05:09 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  io.UnsupportedOperation: not readable RedSkeleton007 2 18,792 Nov-06-2023, 06:32 AM
Last Post: gpurdy
  Python SSL web page scraping Vadanane 1 935 Jan-13-2023, 04:11 PM
Last Post: snippsat
  Human Sorting (natsort) does not work [SOLVED] AlphaInc 2 1,146 Jul-04-2022, 10:21 AM
Last Post: AlphaInc
  How to make x-axis readable with matplotlib Mark17 7 3,941 Mar-01-2022, 04:30 PM
Last Post: DPaul
  Function global not readable by 'main' fmr300 1 1,356 Jan-16-2022, 01:18 AM
Last Post: deanhystad
  sorting alphanumeric values in a human way idiotonboarding 3 2,635 Jan-22-2021, 05:57 PM
Last Post: idiotonboarding
  io.UnsupportedOperation: not readable navidmo 1 3,539 Oct-31-2019, 11:04 PM
Last Post: ichabod801
  Display output in readable format and save hnkrish 1 2,640 Jul-19-2019, 09:29 AM
Last Post: Larz60+
  Batch job from epoch to human time jheeman 6 4,550 Feb-27-2018, 10:53 PM
Last Post: jheeman
  Time Difference in Epoch Microseconds then convert to human readable firesh 4 11,649 Feb-27-2018, 09:08 AM
Last Post: firesh

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020