Oct-09-2018, 06:38 PM
Hello everyone,
I am new to this forum and know very little/nothing about Python/Coding etc.
I am here to get some advice/suggestions on any software I can use to collect data for my PhD research.
Being based in the social sciences/ humanities, I am not well versed in technical jargon etc. so please forgive my primitive explanation, but here goes:
As part of my PhD research, I am investigating political activist's language use across different social media sites: various political fora and selected pages on Twitter and Facebook. To ensure that the sample of data that I analyse is representative, I plan on using a quantitative/corpus approach to down sample production. In order to do this, I need to collect/scrape all of the user generated content within a given time frame from these fora/pages as one large txt. file, and it is this last part that has me in a bit of a pickle.
At the moment, I am copying and pasting content, which is, logistically speaking, a night-mare! It's time consuming and prone to human error etc.
I am curious as to whether or not there is a software or program available that can isolate and compile only user generated content (i.e not any usernames, post-dates/times, and other structural content from the web page) as a simple text file?
Any advice would be greatly appreciated!
I am new to this forum and know very little/nothing about Python/Coding etc.
I am here to get some advice/suggestions on any software I can use to collect data for my PhD research.
Being based in the social sciences/ humanities, I am not well versed in technical jargon etc. so please forgive my primitive explanation, but here goes:
As part of my PhD research, I am investigating political activist's language use across different social media sites: various political fora and selected pages on Twitter and Facebook. To ensure that the sample of data that I analyse is representative, I plan on using a quantitative/corpus approach to down sample production. In order to do this, I need to collect/scrape all of the user generated content within a given time frame from these fora/pages as one large txt. file, and it is this last part that has me in a bit of a pickle.
At the moment, I am copying and pasting content, which is, logistically speaking, a night-mare! It's time consuming and prone to human error etc.
I am curious as to whether or not there is a software or program available that can isolate and compile only user generated content (i.e not any usernames, post-dates/times, and other structural content from the web page) as a simple text file?
Any advice would be greatly appreciated!