Python Forum
Copy same doubled matched words
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Copy same doubled matched words
#1
I've been trying to figure this one out, but all I could find is people having problem with duplication instead of adding it as a feature.
I have this code below and it copies same matched words from two text files, but if there is two or more of same matched word it only copies it once. I need a function if there are two or more of the same matched words copy both of them.
Example: Analyze text1 contains(Some Random Text About Value Value Or More) text2 contains(Value Add More) code would copy 'Value' only once, I need it to copy both values so it says 'Value, Value'
Thanks to Larz60 one more time for helping me with code below.

import os
 
def get_words(filename):
    wordlist = []
    with open(filename) as fp:
        for line in fp:
            wordsinline = line.strip().split()
            for item in wordsinline:
                if item not in wordlist:
                    wordlist.append(item)
    return wordlist
 
def find_common_words(filename1, filename2):
    wordlist1 = []
    wordlist2 = []
    matching_words = []
 
    wordlist1 = get_words(filename1)
    wordlist2 = get_words(filename2)
 
    matching_words = set(wordlist1) & set(wordlist2)
    print(matching_words)
 
def testit():
    # Assert in same directory as code
    os.chdir(os.path.abspath(os.path.dirname(__file__)))
    filename1 = 'words1.txt'
    filename2 = 'words2.txt'
    find_common_words(filename1, filename2)
 
if __name__ == '__main__':
    testit()
Reply
#2
What should happen if text2 contains words which occur more than once?

My understanding is that this is counting. One could approach it:

- make set of words from text2 (performing word cleaning beforehand i.e making them same capitalization, eliminating punctuation and what not)
- using collections.Counter count occurrence of words in text1 (performing the same cleaning beforehand)
- using set operation find overlapping between keys and set and construct new dictionary from them where values are keys multiplied with counted occurrences.
- use this dictionary whichever way is needed

EDIT:

Quick expression of this idea in Python (better idea is to use string.punctuation for removing punctuation):

>>> text_1 = 'Some Random Text About Value Value Or More'
>>> text_2 = 'Value Add More'
>>> words = {word.lower().strip('.,;:') for word in text_2.split()}
>>> words
{'value', 'add', 'more'}
>>> from collections import Counter
>>> occurrances = Counter(word.lower().strip('.,;:') for word in text_1.split())
>>> occurrances
Counter({'value': 2, 'some': 1, 'random': 1, 'text': 1, 'about': 1, 'or': 1, 'more': 1})
>> result = {k: [k for occur in range(occurrances[k])] for k in words.intersection(occurrances)}
>>> result
{'value': ['value', 'value'], 'more': ['more']}
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
Thanks for helping out, I found the solution without set since set was created to remove duplicates. I forgot to set the thread to solved. Thanks again.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Why is the copy method name in python list copy and not `__copy__`? YouHoGeon 2 241 Apr-04-2024, 01:18 AM
Last Post: YouHoGeon
  failing to print not matched lines from second file tester_V 14 6,064 Apr-05-2022, 11:56 AM
Last Post: codinglearner
  Generate a string of words for multiple lists of words in txt files in order. AnicraftPlayz 2 2,791 Aug-11-2021, 03:45 PM
Last Post: jamesaarr
  Python re.sub text manipulation on matched contents before substituting xilex 2 2,107 May-19-2020, 05:42 AM
Last Post: xilex
  print only last matched line tester_V 24 6,463 Apr-30-2020, 05:16 AM
Last Post: deanhystad
  Python csv compare two file, update value if two value is matched kinojom 1 2,533 Apr-17-2019, 10:36 AM
Last Post: DeaD_EyE
  Compare all words in input() to all words in file Trianne 1 2,763 Oct-05-2018, 06:27 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020