Dec-12-2018, 11:40 PM
(This post was last modified: Dec-12-2018, 11:40 PM by pythoncrazy1.)
Hello everyone,
I have a Tokenize exercise and i'm not allowed to use the nltk. I'm kind of stuck on the regex. I am having problems with the quotation marks "" that are not recognized as tokens and also with "Mr. , Ms.", this should be considered as one single token while in my output Mr. appears as 'Mr', '.'. The rest seems to be fine but i am having these two problems.
smile , "I can \ ' t believe it! It \ ' s such a pleasure to
see you!" '
should give an output like
Best regards and thanks in advance,
I have a Tokenize exercise and i'm not allowed to use the nltk. I'm kind of stuck on the regex. I am having problems with the quotation marks "" that are not recognized as tokens and also with "Mr. , Ms.", this should be considered as one single token while in my output Mr. appears as 'Mr', '.'. The rest seems to be fine but i am having these two problems.
text = re.compile (r'[n]'[\w]+|[\w]+(?!')(?:[A-Za-mo-z](?='))?|(?<=\s)[\w](?=)|[^\s\w'][A-Z]?\w+|[;.,!?:]|\')As an example , a text like : ' Mr. Brown opened the door and said with a
smile , "I can \ ' t believe it! It \ ' s such a pleasure to
see you!" '
should give an output like
Output:
[ ' Mr. ' , ' Brown ' , ' opened ' , ' the ' , ' door ' , ' and ' , ' said ' ,
' with ' , ' a ' , ' smile ' ' , ' , ' " ' , ' I ' , ' ca ' , "n ' t", '
believe ' , ' it ' , ' ! ' , ' It ' , " ' s", ' a ' , ' pleasure ' , ' to '
, ' see ' , ' you ' , ' ! ' , ' " ' ]
I hope you understood my problem and manage to help me out. Best regards and thanks in advance,