Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-22-2016, 12:22 AM)snippsat Wrote: (Oct-22-2016, 12:15 AM)Kalet Wrote: See:
http://pastebin /XxqbzBAQ (add .com) You should be able to post link now,it should be only first post restriction.
Look trough the source because data can have changed now.
So regex can not be valid,and what to you want out?
Thanks...
Then:
"caption": "#Plebiscito 2016"
{"text": "Por el simple hecho de dar tu nombre completo, es sencillo buscar tu n\u00famero de c\u00e9dula."
I need extract all text(those in red) and the caption(those in red).
Posts: 7,320
Threads: 123
Joined: Sep 2016
Oct-22-2016, 12:56 AM
(This post was last modified: Oct-22-2016, 01:02 AM by snippsat.)
You should try yourself,here some hints.
Output: <script type="text/javascript">window._sharedData = [b]all data inside here in json[/b] </script>
Regex
print(re.findall(r'<script type="text/javascript">window._sharedData = (.*);</script>', data)[0]) Convert to Json(becomes a Python dictionary) with build in Json parser in Python.
Then take out what you want.
Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-22-2016, 12:56 AM)snippsat Wrote: You should try yourself,here some hints.
Output: <script type="text/javascript">window._sharedData = [b]all data inside here in json[/b] </script>
Regex
print(re.findall(r'<script type="text/javascript">window._sharedData = (.*);</script>', data)[0]) Convert to Json with build in Json parser in Python.
Then take out what you want.
Thank you very much!.
When you have the solution I'll post here... :D
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Oct-22-2016, 01:04 AM)Kalet Wrote: When you have the solution I'll post here... :D I know the solution,don't need to test it out.
The point was for you to try to figure it out
One more hint,post Json data in here.
Then you see is valid,and how it structured better.
Posts: 33
Threads: 9
Joined: Oct 2016
Posts: 7,320
Threads: 123
Joined: Sep 2016
Quote:You're the one who created this forum?.
We where some people on the old forum who decided for this forum.
metulburr created this forum and did run it before we deiced to move.
I did like NodeBB which i did make demo version of,
but i am really pleased how this forum has turned out now
Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-22-2016, 01:46 AM)snippsat Wrote: Quote:You're the one who created this forum?.
We where some people on the old forum who decided for this forum.
metulburr created this forum and did run it before we deiced to move.
I did like NodeBB which i did make demo version of,
but i am really pleased how this forum has turned out now 
He encontrado este foro por casualidad, pero los usuarios colaboran mucho. Después de resolver todo esto, yo pondré de mi parte en este foro,porque se ve que puso mucho esfuerzo .. :D.
PD: Sorry for my english, is very bad, i'm speak spanish...
Posts: 33
Threads: 9
Joined: Oct 2016
Oct-22-2016, 04:18 PM
(This post was last modified: Oct-22-2016, 04:20 PM by Kalet.)
(Oct-22-2016, 01:46 AM)snippsat Wrote: Quote:You're the one who created this forum?.
We where some people on the old forum who decided for this forum.
metulburr created this forum and did run it before we deiced to move.
I did like NodeBB which i did make demo version of,
but i am really pleased how this forum has turned out now 
I tried:
url = "https://www.instagram.com/p/BLExlG_gs9M/"
url_get = requests.get(url)
#print(url_get.text) # All source
a = (re.findall(r'<script type="text/javascript">window._sharedData = (.*);</script>', url_get.text)[0])
data=json.loads((a))
#print(data["entry_data"]["PostPage"])
for a in data["entry_data"]["PostPage"]:
print(a[0]) Output: [{'media': {'comments_disabled': False, 'location': None, 'is_video': False, 'likes': {'count': 735, 'nodes': [{'user': {'username': 'luisfey_tm', 'id': '1467220529', 'profile_pic_url': 'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14723629_1706527029667493_2750772870568214528_a.jpg'}}, {'user': {'username': 'wendylineth21', 'id': '3901636916', 'profile_pic_url': 'https://igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14727520_1371614739517187_4451579541827092480_a.jpg'}}, {'user': {'username': 'vaneyiseth', 'id': '905640633', 'profile_pic_url': 'https://igcdn-photos-e-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/12317947_444661005731956_146114981_a.jpg'}}, {'user': {'username': 'kesofiia', 'id': '1206442330', 'profile_pic_url': 'https://igcdn-photos-g-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14701167_1301507009894718_7730435821307691008_a.jpg'}}, {'user': {'username': 'fergi130885', 'id': '3829079506', 'profile_pic_url': 'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14052264_564103313776023_453051203_a.jpg'}}, {'user': {'username': 'astridjasil', 'id': '3661209399', 'profile_pic_url': 'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/13671943_312970385717957_289304673_a.jpg'}}, {'user': {'username': 'laurubio_29', 'id': '1419151225', 'profile_pic_url': 'https://igcdn-photos-c-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/13734527_302746513408498_1533830342_a.jpg'}}, {'user': {'username': 'obras_blancas', 'id': '1697214155', 'profile_pic_url': 'https://igcdn-photos-a-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14448303_1034705766647048_5355164386581282816_a.jpg'}}, {'user': {'username': 'rebellious_oficial', 'id': '1405468900', 'profile_pic_url': 'https://igcdn-photos-b-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14693809_1144934862267689_1402297549309607936_a.jpg'}}, {'user': {'username': 'roberconsul', 'id': '2127270975', 'profile_pic_url': 'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14294751_1909360602624925_1191931093_a.jpg'}}], 'viewer_has_liked': False}, 'display_src': 'https://igcdn-photos-a-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-15/e35/14474448_381736542214624_4830854127913271296_n.jpg?ig_cache_key=MTM1MjQyMzg0MjUyNTY2MzA1Mg%3D%3D.2', 'dimensions': {'width': 1080, 'height': 1080}, 'caption_is_edited': False, 'usertags': {'nodes': []}, 'is_ad': False, 'code': 'BLExlG_gs9M', 'owner': {'username': 'youngfelprefe', 'is_private': False, 'blocked_by_viewer': False, 'followed_by_viewer': False, 'requested_by_viewer': False, 'id': '331844759', 'is_unpublished': False, 'has_blocked_viewer': False, 'full_name': 'Young F El Efecto F', 'profile_pic_url': 'https://igcdn-photos-b-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14482775_197434037351945_7374535478038495232_a.jpg'}, 'caption': '#Plebiscito 2016', 'id': '1352423842525663052', 'comments': {'count': 6, 'nodes': [{'created_at': 1475442983.0, 'id': '17862897478024941', 'user': {'username': 'luisfelipetv', 'id': '2298791058', 'profile_pic_url': 'http://scontent-icn1-1.cdninstagram.com/t51.2885-19/11906329_960233084022564_1448528159_a.jpg'}, 'text': '@youngfelprefe listo ya voto tambn bien mijo★'}, {'created_at': 1475443748.0, 'id': '17862897862024941', 'user': {'username': 'omeganr', 'id': '202000611', 'profile_pic_url': 'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14156414_1079735282112695_1636583007_a.jpg'}, 'text': 'Si'}, {'created_at': 1475446598.0, 'id': '17862899284024941', 'user': {'username': 'nandocolombia', 'id': '479496344', 'profile_pic_url': 'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14676778_1715076945484909_6612390138339131392_a.jpg'}, 'text': '?\U0001f3fb'}, {'created_at': 1475447612.0, 'id': '17862899803024941', 'user': {'username': 'lauspath', 'id': '261070560', 'profile_pic_url': 'https://igcdn-photos-e-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14727396_1693767414179196_7348884820950253568_a.jpg'}, 'text': 'Por el NOO te conozco tanto jajajajaaj'}, {'created_at': 1475448945.0, 'id': '17862900988024941', 'user': {'username': 'jhonny_reyes23', 'id': '3021395284', 'profile_pic_url': 'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14553081_161587474301255_7594124386945204224_a.jpg'}, 'text': 'Primo Jaja ??'}, {'created_at': 1475519921.0, 'id': '17862953830024941', 'user': {'username': 'jjargel', 'id': '646190648', 'profile_pic_url': 'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/13381051_2033323503560343_1374158480_a.jpg'}, 'text': 'Por el simple hecho de dar tu nombre completo, es sencillo buscar tu número de cédula.'}], 'page_info': {'has_next_page': False, 'start_cursor': None, 'has_previous_page': False, 'end_cursor': None}}, 'date': 1475441507}}]
How could filter this last block?.
Posts: 7,320
Threads: 123
Joined: Sep 2016
import requests
import re
import json
url = "https://www.instagram.com/p/BLExlG_gs9M/"
url_get = requests.get(url)
sorurce = url_get.text
data_json = re.findall(r'<script type="text/javascript">window._sharedData = (.*);</script>', sorurce)[0]
data = json.loads(data_json) Use it:
>>> data['entry_data']['PostPage'][0]['media']['caption']
'#Plebiscito 2016' json.loads() give back a python dictionary.
In this dictionary there is a mix of dictionary/list.
Here dos ['PostPage'][0] contain a list,
therefor [0] to get get contented inside this list and continue to navigate.
Posts: 33
Threads: 9
Joined: Oct 2016
(Oct-22-2016, 04:46 PM)snippsat Wrote: import requests
import re
import json
url = "https://www.instagram.com/p/BLExlG_gs9M/"
url_get = requests.get(url)
sorurce = url_get.text
data_json = re.findall(r'<script type="text/javascript">window._sharedData = (.*);</script>', sorurce)[0]
data = json.loads(data_json) Use it:
>>> data['entry_data']['PostPage'][0]['media']['caption']
'#Plebiscito 2016' json.loads() give back a python dictionary.
In this dictionary there is a mix of dictionary/list.
Here dos ['PostPage'][0] contain a list,
therefor [0] to get get contented inside this list and continue to navigate. Oh, i understand.
I will try with the comments(text):
print(data['entry_data']['PostPage'][0]['media']['comments'])
#print(data['entry_data']['PostPage'][0]['media']['comments']['nodes']['text']) #Error **sad** Output: {'page_info': {'has_next_page': False, 'start_cursor': None, 'end_cursor': None, 'has_previous_page': False}, 'nodes': [{'text': '@youngfelprefe listo ya voto tambn bien mijo★', 'user': {'profile_pic_url': 'http://scontent-lax3-1.cdninstagram.com/t51.2885-19/11906329_960233084022564_1448528159_a.jpg', 'id': '2298791058', 'username': 'luisfelipetv'}, 'id': '17862897478024941', 'created_at': 1475442983.0}, {'text': 'Si', 'user': {'profile_pic_url': 'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14156414_1079735282112695_1636583007_a.jpg', 'id': '202000611', 'username': 'omeganr'}, 'id': '17862897862024941', 'created_at': 1475443748.0}, {'text': '?\U0001f3fb', 'user': {'profile_pic_url': 'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14676778_1715076945484909_6612390138339131392_a.jpg', 'id': '479496344', 'username': 'nandocolombia'}, 'id': '17862899284024941', 'created_at': 1475446598.0}, {'text': 'Por el NOO te conozco tanto jajajajaaj', 'user': {'profile_pic_url': 'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14736230_1778075029077069_7479698963461832704_a.jpg', 'id': '261070560', 'username': 'lauspath'}, 'id': '17862899803024941', 'created_at': 1475447612.0}, {'text': 'Primo Jaja ??', 'user': {'profile_pic_url': 'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/14553081_161587474301255_7594124386945204224_a.jpg', 'id': '3021395284', 'username': 'jhonny_reyes23'}, 'id': '17862900988024941', 'created_at': 1475448945.0}, {'text': 'Por el simple hecho de dar tu nombre completo, es sencillo buscar tu número de cédula.', 'user': {'profile_pic_url': 'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xpa1/t51.2885-19/s150x150/13381051_2033323503560343_1374158480_a.jpg', 'id': '646190648', 'username': 'jjargel'}, 'id': '17862953830024941', 'created_at': 1475519921.0}], 'count': 6}
|