Python Forum
Datasets of grammatically uncommon sentences?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Datasets of grammatically uncommon sentences?
#1
Hi,
I'm trying to train a model which replaces grammatically correct but uncommon sentences, with their more common counterparts. So I'm looking for any datasets of grammatically uncommon sentences paired with their more common versions.

For eg.
1. Already, enough punishment had been given.
2. Enough punishment had been given already.
3. Enough punishment had already been given.
All sentences say the same thing, but No. 3 is the most likely version you would encounter.

Another eg.
1. Either by you or someone else, the bill must be paid.
2. Either you or someone else must pay the bill.
3. The bill must be paid either by you or someone else.
No. 1 would be the unlikely version. No. 2 & 3 are more likely.

Any suggestions where I can find these? Or any other mechanisms through which I can achieve the end goal of replacing uncommon grammar with "common grammar".

Thanks
Reply
#2
here's a good place to look: https://en.wikipedia.org/wiki/List_of_text_corpora
Reply
#3
Thanks. That was some pretty interesting stuff there. But not exactly what I was looking for. Maybe some English as second-language datasets might do the trick?
Reply
#4
The only package that I know of that will help with grammar (for any language, actually) is NLTK.
This article is specifically written to that end: https://www.nltk.org/book/ch08.html
Also, google 'grammatically analyse a sentence with python'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Fastest way to subtract elements of datasets of HDF5 file? Robotguy 3 1,082 Aug-01-2020, 11:48 PM
Last Post: scidam
  Datasets lErn1324 1 803 Jul-17-2020, 06:29 PM
Last Post: Larz60+
  Excel Rows to sentences spartak315 0 1,139 Aug-24-2018, 07:48 PM
Last Post: spartak315

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020