Python Forum

Full Version: How to find unique and common words per line from a txt file?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I had just started with Python and run into this task:

1.Find words that exists on both even and odd lines.
2.Words that only exists on even lines
3.Words that only exists on odd lines

All punctuations and uppercase is removed so we dont need to worry about that. However it is several words on the same line

The output should look something like this

Common words on both lines: ['I', 'the', 'am', 'all', 'as', ...]
Only even lines : ['yellow', 'christmas', 'smell', ...]
Only odd lines: ['yours', 'war', 'may', 'remote', ...]

I started like

evens, odds = set(), set() 
with open('textfile.txt') as f: 
 for index, row in enumerate(f): 
   if index % 2 == 0: 
     evens.add(row.strip()) 
   else: 
           odds.add(row.strip())
How should I continue from here? Is this part correct?

Would love if someone could finish the task so I could use it as a template when I do other tasks.
Did some progress I think and added the split and sorted function.

evens, odds = set(), set()
with open('textfile.txt') as f:
    for index, row in enumerate(f):
        if index % 2 == 0:
            evens.update(row.split())
        else:
            odds.update(row.split())
commons = sorted(evens & odds)
Any tips how I can find the unique words on the even and odd lines?
look at sets. This would be easiest.
you can also loop over and check if each words is/is not in the other one...
Thanks a lot.

Right now it is like this
evens, odds = set(), set()
with open('shakespeare.txt') as f:
    for index, row in enumerate(f):
        if index %2 :
            evens.update(row.split())
        else:
            odds.update(row.split())


# In[2]:


commons = sorted(evens & odds)
unique_odds=set(odds-evens)
unique_evens=set(evens-odds)


# In[4]:


commons


# In[5]:


unique_odds


# In[6]:


unique_evens
How can I transpose my list to rows? How can I return the values when I run my query?

The output should look something like this

commons: ['I', 'the', 'am', 'all', 'as', ...]
unique_evens : ['yellow', 'christmas', 'smell', ...]
unique_odds: ['yours', 'war', 'may', 'remote', ...]
print('commons: {}'.format(list(commons)))
that is if you want square brackets. if curly brackets are OK, you can skip convert the set to list.
Thanks a lot.
You can make this a little shorter
odds, even = eo = [set(), set()]
with open('shakespeare.txt') as f:
    for index, row in enumerate(f):
        eo[index % 2].update(row.strip().split())
Observe that python numbers lines from 0 while common CS and programming literature and software starts line counts at 1.