Posts: 58
Threads: 15
Joined: Jun 2017
Hi all... will admit right up front that Python has been an horrible nightmare for me. I can't seem to get anything to work, can't figure out the syntax, and struggle with literally every line I try to write. I believe I have version 3.5.1, and am using PyCharm for editing.
Hoping somebody can help me solve what I would think is a relatively basic problem (but I can't say how difficult it is in Python). Normally I'd just use vba, but with 800,000+ rows, it's killing my CPU and crashing. Plus this may be a good way for me to finally get my feet wet, and if I can actually get it to do something useful, I will finally see some good in it :-)
I have a text file with two columns. Column 1 has a code, and Column 2 has a description of that code. The problem is, some of the descriptions are truncated, or have changed. So I want to find every instance of each code with varying descriptions. For example my file has:
MSFT Microsoft Corp
MSFT Microsft Co
MSFT Microsoft Corporation
MSFT Microsoft Corp
I would like to output
MSFT Microsoft Corp
MSFT Microsft Co
MSFT Microsoft Corporation
With that output, I can write a sql script to change all "MSFT" descriptors to any one of them.
So far, this is the only thing I can get to work
import pandas as pd
file = open("f://HITS//fixnames.txt", "r") Thank you for any help!
Posts: 4,220
Threads: 97
Joined: Sep 2016
You say that's the only thing you can get to work. What have you tried that isn't working? How exactly is it not working?
Start small. You haven't even read the file in yet. How have you tried to read the file in? What have you gotten? And how exactly is the file formatted?
Posts: 58
Threads: 15
Joined: Jun 2017
Thanks for the reply!
I don't have any of the code I've tried, because after spending 3-4 hours trying to figure out syntax, structure, and deciphering error codes (when I get them), I just delete the code/files and try to start fresh. Then I spend another 3-4 hours 'playing around' until I'm on the verge of a stroke. I've done tutorials like they have at Codeacademy and Learning Tree, copied code from places like Stackoverflow, and watched at least a dozen videos on youtube and other learning sites.
This can be done fairly easily in VBA, and with a bit more effort, SQL - but I really want to try to get Python to do something.
Does't
file = open("f://HITS//fixnames.txt", "r") read the file in? Last night I added "print(file)" and it was giving me back the contents of the file (that was a moment of sheer ecstasy for me).
But this morning, having changed nothing, I get this error back:
<_io.TextIOWrapper name='f://HITS//fixnames.txt' mode='r' encoding='cp1252'>
The structure of the file:
I've played around with this quite a bit as well and have two versions
1) text file, tab delimited, with 2 columns
2) csv with 2 columns
I've tried importing pandas and reading the file in (also tried reading it in as a table).
Posts: 8,169
Threads: 160
Joined: Sep 2016
playing around is nice, but one should have at least limited understanding of what is trying. start by reading some tutorials
1. working with files - https://python-forum.io/Thread-Basic-Files
2. lists and dicts - https://python-forum.io/Thread-Basic-Lists and https://python-forum.io/Thread-Basic-Dictionaries
Posts: 58
Threads: 15
Joined: Jun 2017
Buran - I've done dozens of tutorials, but I don't think I've seen those, specifically. I'll certainly give them a read through after I get this project done (I guess I just have to do it the easy way and put Python on the back burner again).
Thank you so much for your time and the links!
Posts: 8,169
Threads: 160
Joined: Sep 2016
well it's several lines of python code (between 2 and 7) to produce variation the desired output
Posts: 58
Threads: 15
Joined: Jun 2017
Thanks Buran - I'm at least 12 hours into this project and so far, have got nothing other than reading in a file, which is no longer working. I already have the script done in VBA to do it, and have it running on a virtual machine, so hopefully it won't crash.
I would love to spend more time trying to figure it out (even if it's 50-100 lines), but right now, my manager just wants the output and I can't keep trying to piece this puzzle together.
I will absolutely take a look at the links you so kindly provided - I'm sure they'll be a lot more helpful than all the other random stuff I've been reading/trying. Thank you again!
Posts: 8,169
Threads: 160
Joined: Sep 2016
msft.txt:
Output: MSFT,Microsoft Corp
MSFT,Microsft Co
MSFT,Microsoft Corporation
MSFT,Microsoft Corp
with open ('msft.txt', 'r') as in_file:
print set(tuple(line.strip().split(',')) for line in in_file) Output: set([('MSFT', 'Microsoft Corporation'), ('MSFT', 'Microsoft Corp'), ('MSFT', 'Microsft Co')])
Posts: 58
Threads: 15
Joined: Jun 2017
Jun-14-2017, 01:09 PM
(This post was last modified: Jun-14-2017, 01:12 PM by JP_ROMANO.)
Thank you...
I get back
SyntaxError: invalid syntax
And when I put the text after PRINT in parens, I get back
),...
I appreciate your time, honestly, but you've wasted enough of it on me. This is just not going to work - I may have to consider a different project to get my feet wet in this language!
Posts: 8,169
Threads: 160
Joined: Sep 2016
my example was python2, you obviously use python3 where print is a function, so it should be
with open ('msft.txt', 'r') as in_file:
print(set(tuple(line.strip().split(',')) for line in in_file)) and the output would be
Output: {('MSFT', 'Microsoft Corp'), ('MSFT', 'Microsoft Corporation'), ('MSFT', 'Microsft Co')}
|