Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Count variations
#1
Hi all... will admit right up front that Python has been an horrible nightmare for me. I can't seem to get anything to work, can't figure out the syntax, and struggle with literally every line I try to write. I believe I have version 3.5.1, and am using PyCharm for editing.

Hoping somebody can help me solve what I would think is a relatively basic problem (but I can't say how difficult it is in Python). Normally I'd just use vba, but with 800,000+ rows, it's killing my CPU and crashing. Plus this may be a good way for me to finally get my feet wet, and if I can actually get it to do something useful, I will finally see some good in it :-)

I have a text file with two columns. Column 1 has a code, and Column 2 has a description of that code. The problem is, some of the descriptions are truncated, or have changed. So I want to find every instance of each code with varying descriptions. For example my file has:

MSFT Microsoft Corp
MSFT Microsft Co
MSFT Microsoft Corporation
MSFT Microsoft Corp

I would like to output

MSFT Microsoft Corp
MSFT Microsft Co
MSFT Microsoft Corporation

With that output, I can write a sql script to change all "MSFT" descriptors to any one of them.

So far, this is the only thing I can get to work

import pandas as pd
file = open("f://HITS//fixnames.txt", "r")
Thank you for any help!
Reply
#2
You say that's the only thing you can get to work. What have you tried that isn't working? How exactly is it not working?

Start small. You haven't even read the file in yet. How have you tried to read the file in? What have you gotten? And how exactly is the file formatted?
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
Thanks for the reply!
I don't have any of the code I've tried, because after spending 3-4 hours trying to figure out syntax, structure, and deciphering error codes (when I get them), I just delete the code/files and try to start fresh. Then I spend another 3-4 hours 'playing around' until I'm on the verge of a stroke. I've done tutorials like they have at Codeacademy and Learning Tree, copied code from places like Stackoverflow, and watched at least a dozen videos on youtube and other learning sites.

This can be done fairly easily in VBA, and with a bit more effort, SQL - but I really want to try to get Python to do something.

Does't
file = open("f://HITS//fixnames.txt", "r") read the file in? Last night I added "print(file)" and it was giving me back the contents of the file (that was a moment of sheer ecstasy for me).

But this morning, having changed nothing, I get this error back:
<_io.TextIOWrapper name='f://HITS//fixnames.txt' mode='r' encoding='cp1252'>

The structure of the file:
I've played around with this quite a bit as well and have two versions
1) text file, tab delimited, with 2 columns
2) csv with 2 columns

I've tried importing pandas and reading the file in (also tried reading it in as a table).
Reply
#4
playing around is nice, but one should have at least limited understanding of what is trying. start by reading some tutorials
1. working with files - https://python-forum.io/Thread-Basic-Files
2. lists and dicts - https://python-forum.io/Thread-Basic-Lists and https://python-forum.io/Thread-Basic-Dictionaries
Reply
#5
Buran - I've done dozens of tutorials, but I don't think I've seen those, specifically. I'll certainly give them a read through after I get this project done (I guess I just have to do it the easy way and put Python on the back burner again).

Thank you so much for your time and the links!
Reply
#6
well it's several lines of python code (between 2 and 7) to produce variation the desired output
Reply
#7
Thanks Buran - I'm at least 12 hours into this project and so far, have got nothing other than reading in a file, which is no longer working. I already have the script done in VBA to do it, and have it running on a virtual machine, so hopefully it won't crash.

I would love to spend more time trying to figure it out (even if it's 50-100 lines), but right now, my manager just wants the output and I can't keep trying to piece this puzzle together.

I will absolutely take a look at the links you so kindly provided - I'm sure they'll be a lot more helpful than all the other random stuff I've been reading/trying. Thank you again!
Reply
#8
msft.txt:
Output:
MSFT,Microsoft Corp MSFT,Microsft Co MSFT,Microsoft Corporation MSFT,Microsoft Corp
with open ('msft.txt', 'r') as in_file:
   print set(tuple(line.strip().split(',')) for line in in_file)
Output:
set([('MSFT', 'Microsoft Corporation'), ('MSFT', 'Microsoft Corp'), ('MSFT', 'Microsft Co')])
Reply
#9
Thank you...
I get back
SyntaxError: invalid syntax

And when I put the text after PRINT in parens, I get back
),...

I appreciate your time, honestly, but you've wasted enough of it on me. This is just not going to work - I may have to consider a different project to get my feet wet in this language!
Reply
#10
my example was python2, you obviously use python3 where print is a function, so it should be

with open ('msft.txt', 'r') as in_file:
   print(set(tuple(line.strip().split(',')) for line in in_file))
and the output would be

Output:
{('MSFT', 'Microsoft Corp'), ('MSFT', 'Microsoft Corporation'), ('MSFT', 'Microsft Co')}
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020