Python Forum

Full Version: Web scraping User Generated Content
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone,

I am new to this forum and know very little/nothing about Python/Coding etc.

I am here to get some advice/suggestions on any software I can use to collect data for my PhD research.

Being based in the social sciences/ humanities, I am not well versed in technical jargon etc. so please forgive my primitive explanation, but here goes:

As part of my PhD research, I am investigating political activist's language use across different social media sites: various political fora and selected pages on Twitter and Facebook. To ensure that the sample of data that I analyse is representative, I plan on using a quantitative/corpus approach to down sample production. In order to do this, I need to collect/scrape all of the user generated content within a given time frame from these fora/pages as one large txt. file, and it is this last part that has me in a bit of a pickle.

At the moment, I am copying and pasting content, which is, logistically speaking, a night-mare! It's time consuming and prone to human error etc.

I am curious as to whether or not there is a software or program available that can isolate and compile only user generated content (i.e not any usernames, post-dates/times, and other structural content from the web page) as a simple text file?

Any advice would be greatly appreciated!
To get a feel for it in just a few minutes, take the following two tutorials:
https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2
You can also search the forum, search for BeautifulSoup, lxml, selenium, requests and scrapy, as well as the names of social media sites. There's a lot of content here.
(Oct-09-2018, 09:56 PM)Larz60+ Wrote: [ -> ]To get a feel for it in just a few minutes, take the following two tutorials:
https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2
You can also search the forum, search for BeautifulSoup, lxml, selenium, requests and scrapy, as well as the names of social media sites. There's a lot of content here.

Thank you :)