Python Forum
Web scraping User Generated Content
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web scraping User Generated Content
#1
Hello everyone,

I am new to this forum and know very little/nothing about Python/Coding etc.

I am here to get some advice/suggestions on any software I can use to collect data for my PhD research.

Being based in the social sciences/ humanities, I am not well versed in technical jargon etc. so please forgive my primitive explanation, but here goes:

As part of my PhD research, I am investigating political activist's language use across different social media sites: various political fora and selected pages on Twitter and Facebook. To ensure that the sample of data that I analyse is representative, I plan on using a quantitative/corpus approach to down sample production. In order to do this, I need to collect/scrape all of the user generated content within a given time frame from these fora/pages as one large txt. file, and it is this last part that has me in a bit of a pickle.

At the moment, I am copying and pasting content, which is, logistically speaking, a night-mare! It's time consuming and prone to human error etc.

I am curious as to whether or not there is a software or program available that can isolate and compile only user generated content (i.e not any usernames, post-dates/times, and other structural content from the web page) as a simple text file?

Any advice would be greatly appreciated!
Reply
#2
To get a feel for it in just a few minutes, take the following two tutorials:
https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2
You can also search the forum, search for BeautifulSoup, lxml, selenium, requests and scrapy, as well as the names of social media sites. There's a lot of content here.
Reply
#3
(Oct-09-2018, 09:56 PM)Larz60+ Wrote: To get a feel for it in just a few minutes, take the following two tutorials:
https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2
You can also search the forum, search for BeautifulSoup, lxml, selenium, requests and scrapy, as well as the names of social media sites. There's a lot of content here.

Thank you :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping the page without distorting content oleglpts 5 2,442 Dec-16-2021, 05:08 PM
Last Post: oleglpts
  Python Web Scraping can not getting all HTML content yqqwe123 0 1,614 Aug-02-2021, 08:56 AM
Last Post: yqqwe123
  Web Scraping Inquiry (Extracting content from a table in asubdomain) DustinKlent 3 3,663 Aug-17-2020, 10:10 AM
Last Post: snippsat
  Download images generated by user input one_of_us 0 2,463 Mar-26-2019, 07:58 AM
Last Post: one_of_us

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020