Python Forum
Negative lookahead not working, help please
Thread Rating:
  • 3 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Negative lookahead not working, help please
#1
Hi everyone,

So here is my problem. I have a bunch of tweets and various metadata that I want to analyze for sociolinguistic purposes. In order to do this, I'm trying to infer users' ages thanks to the information they provide in their bio, among others. For that I'm using regular expressions to match a couple of recurring patterns in users' bio, like a user mentioning a number followed by various spellings of "years old" as in:

"John, 30 years old, engineer."

The reason why I'm using regexes for this is that there are actually very few ways people use to mention there age on Twitter, so just three or four regexes would allow me to infer the age of most users in my dataset. However, in this case I also want to check for what comes after "years old", as many people mention their children's age, and I don't want this to be incorrectly associated to the user's age, as in:

"John, father of a 12 year old kid, engineer"


So cases as the one above should be ignored, so that I can only keep users for whom a valid age can be inferred.
My program looks like this:
[code]import csv
import re

with open("test_corpus.csv") as corpus:
    corpus_read = csv.reader(corpus, delimiter=",")
    for row in corpus_read:
        if re.findall(r"\d{2}\s?(?=years old\s?|yo\s?|yr old\s?|y o\s?|yrs old\s?|year old\s?(?!son|daughter|kid|child))",row[5].lower()):
            age = re.findall(r"\d{2}\s?",row[5].lower())
            for i in age:
                print(i)[/code]
The program seems to work in some cases, but in the small test file I created to try it out, it incorrectly matches the age mentioned in the string "I have a 12 yo son", and returns 12 as a matched age, which I don't want it to. I'm guessing this has something to do with brackets or delimiters at some point in the program, but I spent a few days on it, and I could not find anything helpful on the forum, so any help would be appreciated.

Thus, the actual question is: how to make the program not recognize 12 in "John, father of a 12 year old kid, engineer" as the age of the user, based on the program I already have?


I am somewhat new at programming, so apologies if I forgot to mention something important, do not hesitate to tell me if you need more details.

Thanks in advance for any help you could provide!
Reply


Messages In This Thread
Negative lookahead not working, help please - by MitchBuchanon - Feb-26-2017, 01:48 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  negative memory usage akbarza 1 173 Apr-27-2024, 08:43 AM
Last Post: Gribouillis
  Negative indexing/selecting working and not working Andrzej_Andrzej 21 2,243 Jul-14-2023, 08:37 PM
Last Post: deanhystad
  is there any tool to convert negative base to int? Skaperen 7 2,440 May-27-2022, 07:30 AM
Last Post: Gribouillis
  Def code does not work for negative integers. doug2019 1 1,940 Oct-31-2019, 11:00 PM
Last Post: ichabod801
  offset can not be negative in File.seek()? jollydragon 6 7,043 Sep-28-2019, 03:08 AM
Last Post: jollydragon
  Positive to negative bernardoB 6 4,414 Mar-13-2019, 07:39 PM
Last Post: bernardoB
  negative to positive slices Skaperen 3 3,666 Jan-29-2018, 05:47 AM
Last Post: Skaperen
  Negative numbers and fractional powers Flexico 1 4,911 Dec-08-2016, 04:12 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020