Web scraping User Generated Content - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Web scraping User Generated Content (/thread-13315.html) |
Web scraping User Generated Content - StephenG93 - Oct-09-2018 Hello everyone, I am new to this forum and know very little/nothing about Python/Coding etc. I am here to get some advice/suggestions on any software I can use to collect data for my PhD research. Being based in the social sciences/ humanities, I am not well versed in technical jargon etc. so please forgive my primitive explanation, but here goes: As part of my PhD research, I am investigating political activist's language use across different social media sites: various political fora and selected pages on Twitter and Facebook. To ensure that the sample of data that I analyse is representative, I plan on using a quantitative/corpus approach to down sample production. In order to do this, I need to collect/scrape all of the user generated content within a given time frame from these fora/pages as one large txt. file, and it is this last part that has me in a bit of a pickle. At the moment, I am copying and pasting content, which is, logistically speaking, a night-mare! It's time consuming and prone to human error etc. I am curious as to whether or not there is a software or program available that can isolate and compile only user generated content (i.e not any usernames, post-dates/times, and other structural content from the web page) as a simple text file? Any advice would be greatly appreciated! RE: Web scraping User Generated Content - Larz60+ - Oct-09-2018 To get a feel for it in just a few minutes, take the following two tutorials: https://python-forum.io/Thread-Web-Scraping-part-1 https://python-forum.io/Thread-Web-scraping-part-2 You can also search the forum, search for BeautifulSoup, lxml, selenium, requests and scrapy, as well as the names of social media sites. There's a lot of content here. RE: Web scraping User Generated Content - StephenG93 - Oct-10-2018 (Oct-09-2018, 09:56 PM)Larz60+ Wrote: To get a feel for it in just a few minutes, take the following two tutorials: Thank you :) |