Aug-20-2017, 08:50 PM
hey guys,
so recently i started getting into python again and i was thinking about taking on a bigger challenge: a facebook friends list crawler.
i've done a crawler before using 'requests' and 'beautifulsoup' modules and it was kind of ok but nothing special (you can find it in my previous posts).
i've reused some of the code just to get me started but i've gotten to a sticking point.
first i looked around someone's fb main page just to see how it would look like and the home page looks pretty straight forward:
facebook.com/[id] (where id is person's facebook id; most of them start with 1000...)
then i went over to 'friends' section and i noticed it has the following format:
facebook.com/friends = [your id] & [some id you find in person's homepage source code] & [some id i'm assuming fb sends back to create a session] = friends
this is obviously grossly over simplified but it's just to get an idea.
i also noticed that the session id (the whole link) stays consistent throughout the entire session. meaning, if i go on a friend's page and go to 'friends' and generate the session link; then if i open a new tab and copy/paste the session link, it takes me to the same page.
if i close all tabs and paste the link later it shows a blank page.
and out of those 3 nrs, the first 2 stay constant for 1 friend and the 3rd fb session id nr changes every time.
so i took the link and kept the session open just to be able to access it with requests and bs4 and here is where i'm stuck right now:
every friend's name appears under a html tag <div> with another tag inside called <class="fsl fwb fcb">. then you have an <a> tag with href="friend's fb homepage" and then a <data-gt> where you have the fb id, which is what we're after in our crawler.
problem is that when python makes the request it looks like it's an anonymous request and it only takes you to the person's homepage and it says "log into fb to continue".
i found this out by printing the requests.get() of the webpage because i kept getting nothing when i was doing print('data-gt').
so my question is: you guys have any idea how to make a request from python that is not anonymous? meaning i log in using my id. as if i were browsing from my homepage.
if you want i can post the source code there's next to nothing in there at the moment.
thanks
so recently i started getting into python again and i was thinking about taking on a bigger challenge: a facebook friends list crawler.
i've done a crawler before using 'requests' and 'beautifulsoup' modules and it was kind of ok but nothing special (you can find it in my previous posts).
i've reused some of the code just to get me started but i've gotten to a sticking point.
first i looked around someone's fb main page just to see how it would look like and the home page looks pretty straight forward:
facebook.com/[id] (where id is person's facebook id; most of them start with 1000...)
then i went over to 'friends' section and i noticed it has the following format:
facebook.com/friends = [your id] & [some id you find in person's homepage source code] & [some id i'm assuming fb sends back to create a session] = friends
this is obviously grossly over simplified but it's just to get an idea.
i also noticed that the session id (the whole link) stays consistent throughout the entire session. meaning, if i go on a friend's page and go to 'friends' and generate the session link; then if i open a new tab and copy/paste the session link, it takes me to the same page.
if i close all tabs and paste the link later it shows a blank page.
and out of those 3 nrs, the first 2 stay constant for 1 friend and the 3rd fb session id nr changes every time.
so i took the link and kept the session open just to be able to access it with requests and bs4 and here is where i'm stuck right now:
every friend's name appears under a html tag <div> with another tag inside called <class="fsl fwb fcb">. then you have an <a> tag with href="friend's fb homepage" and then a <data-gt> where you have the fb id, which is what we're after in our crawler.
problem is that when python makes the request it looks like it's an anonymous request and it only takes you to the person's homepage and it says "log into fb to continue".
i found this out by printing the requests.get() of the webpage because i kept getting nothing when i was doing print('data-gt').
so my question is: you guys have any idea how to make a request from python that is not anonymous? meaning i log in using my id. as if i were browsing from my homepage.
if you want i can post the source code there's next to nothing in there at the moment.
thanks