Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-21-2016, 10:37 PM)snippsat Wrote: Now is source is bytes,add .decode('utf-8').
You will see a cleaner result.
Python 3.x need this for it to be a normal string.
In Python 3.x are all strings sequences of Unicode characters,if not bytes.
Not works.. See
connect_to = urlopen("https://instagram.com/p/BL1rrSQDu48")
reading = (connect_to.read())
filter1 = find.findall(reading.decode('utf-8'))
print(filter1)
Posts: 7,320
Threads: 123
Joined: Sep 2016
Oct-21-2016, 11:13 PM
(This post was last modified: Oct-21-2016, 11:14 PM by snippsat.)
Not on regex stuff,the whole source.
reading = connect_to.read().decode('utf-8') But you should use Requests,then you get correct encoding that source use.
import requests
url = 'https://instagram.com/p/BL1rrSQDu48'
url_get = requests.get(url)
#print(url_get.text) # All source
print(url_get.encoding) # ISO-8859-1
Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-21-2016, 11:13 PM)snippsat Wrote: Not on regex stuff,the whole source.
reading = connect_to.read().decode('utf-8') But you should use Requests,then you get correct encoding that source use.
import requests
url = 'https://instagram.com/p/BL1rrSQDu48'
url_get = requests.get(url)
#print(url_get.text) # All source
print(url_get.encoding) # ISO-8859-1
Try, i'm obtain a error:
Traceback (most recent call last):
File "instagram.py", line 89, in <module>
connect()
File "instagram.py", line 69, in connect
print(url_get.text) # All source
File "E:\Programs\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x80' in position 9364: character maps to <undefined>
Posts: 7,320
Threads: 123
Joined: Sep 2016
Use Requests,then you will never use urllib again.
I have testes code i post over and it work.
Posts: 33
Threads: 9
Joined: Oct 2016
Oct-21-2016, 11:48 PM
(This post was last modified: Oct-21-2016, 11:55 PM by Kalet.)
(Oct-21-2016, 11:31 PM)snippsat Wrote: Use Requests,then you will never use urllib again.
I have testes code i post over and it work.
I have againt test, and not work...
print(url_get.encoding) # ISO-8859-1 This works but the next line not works..
print(url_get.text) # All source I keep getting the same error
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x80' in position 9364: character maps to <undefined>
(Oct-21-2016, 11:31 PM)snippsat Wrote: Use Requests,then you will never use urllib again.
I have testes code i post over and it work.
¡Hey! thats works.. but with command in Windows
In the cmd ->
Output: [size=small][font=Monaco, Consolas, Courier, monospace]chcp 65001[/font][/size]
Posts: 7,320
Threads: 123
Joined: Sep 2016
Strange this is my output.
using Python 3.4 and:
>>> requests.__version__
'2.9.1' Try:
print(url_get.content) But then is back to bytes again i guess?
Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-21-2016, 11:59 PM)snippsat Wrote: Strange this is my output.
using Python 3.4 and:
>>> requests.__version__
'2.9.1' Try:
print(url_get.content) But then is back to bytes again i guess?
The code it's works, but, I still have the problem of getting the text and caption.
See:
http://pastebin /XxqbzBAQ (add .com)
Posts: 7,320
Threads: 123
Joined: Sep 2016
Oct-22-2016, 12:07 AM
(This post was last modified: Oct-22-2016, 12:10 AM by snippsat.)
(Oct-21-2016, 11:48 PM)Kalet Wrote: ¡Hey! thats works.. but with command in Windows
In the cmd -> Yes get the same in cmd,which has always been broken when it comes to Unicode.
So don't use it for output in cases like this.
Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-22-2016, 12:07 AM)snippsat Wrote: (Oct-21-2016, 11:48 PM)Kalet Wrote: ¡Hey! thats works.. but with command in Windows
In the cmd -> Yes get the same in cmd,which has always been broken when it comes to Unicode.
So don't use it for output in cases like this.
Oh, that's great!
But i now, I still have the problem of getting the text and caption.
See:
http://pastebin /XxqbzBAQ (add .com)
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Oct-22-2016, 12:15 AM)Kalet Wrote: See:
http://pastebin /XxqbzBAQ (add .com) You should be able to post link now,it should be only first post restriction.
Look trough the source because data can have changed now.
So regex can not be valid,and what to you want out?
|