Python Forum

Full Version: How can I parse a text file?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I have a file. It has such a content
-----myfile.txt---------
Name - Mike
City - Mosсow
Have - dog
something
Name - Dasha
City - Voronesh
Have - cat
something
Name - Sveta
Citi - Vologda
Have - mouse
something
-----------------------
How can I make it
1 Name Mike City Mosсow Have dog
2 Name Dasha City Voronesh Have cat
3 Name Sveta City Vologda Have mouse
What have you tried?
This also look like a typical school task.
No, this is not a school task. That file just for example. Actually I need to parse another file. I do to scan a command 'sudo iwlist scanning' and I need to get some the formatted data. That is way, I have decided to start with this file. I don't even know how do it.
You start with basic like read in the file.
If this all new,you need to study Python at a basic level better.
# Read file line bye line 
with open('myfile.txt') as f:
   for line in f:
       line = line.strip()
       print(line)
# Read to a list
with open('myfile.txt') as f:
   result = [i.strip() for i in f]
   print(result)
# Read to a string
with open('myfile.txt') as f:
    result = f.read()
    print(result)
So can you start to think of option you can use.
Like what happens if do:
if 'something' in line:
   print(line) 
Or what happens if  split() on something in last example.
>>> with open('myfile.txt') as f:
...     result = f.read()
...     
>>> result.split('something')
['Name - Mike\nCity - Moscow\nHave - dog\n',
 '\nName - Dasha\nCity - Voronesh\nHave - cat\n',
 '\nName - Sveta\nCiti - Vologda\nHave - mouse\n',
 '']
Almost to many \n Wink
Thank you very much!

Here you see that I have just coded  Smile
def command_save(command):
   var = subprocess.check_output(command.split(), universal_newlines=True)
   output = open('scanning.txt', 'w')
   print(var, file=output)
   output.close()

def scannig():
   var = subprocess.check_output(['sudo', 'iwlist', 'scanning'], universal_newlines=True)
   output = open('scanning.txt', 'w')
   print(var, file=output)
   output.close()

address = []
channel = []
essid = []
f = open('scanning.txt', 'r').readlines()
for x in f:
   if "Address" in x:
       address.append(x)


for x in f:
   if "Channel" in x:
       channel.append(x)

for x in f:
   if "ESSID" in x:
       essid.append(x)

address = ''.join(address)
channel = ''.join(channel)
essid = ''.join(essid)

address = address.split()
channel = channel.split()
essid = essid.split()
Now I cannot join address, channel, essid together them.
If you have 3 lists with the related items in order, then why not just use zip to walk through the 3 lists together?
Susan
If the file has this structure ( name, city, have, something ) you can print it as you want just in one loop. What happens with 'something'?
(May-30-2017, 10:57 PM)Mike Ru Wrote: [ -> ]
def command_save(command):
   var = subprocess.check_output(command.split(), universal_newlines=True)
   output = open('scanning.txt', 'w')
   print(var, file=output)
   output.close()

def scannig():
   var = subprocess.check_output(['sudo', 'iwlist', 'scanning'], universal_newlines=True)
   output = open('scanning.txt', 'w')
   print(var, file=output)
   output.close()

I have couple of recommendations for you:
  1. Those 2 functions are identical, excluding command string, which is in violation of DRY imperative - Don't Repeat Yourself. You should write just 1 function, and always pass command line
  2. You hard-code the name of your output file - it's a bad practice, because you may unintentionally overwrite it. If you want to continue writing to the same file - you should open it with a flag. In any case, you should pass file name as a parameter too.
  3. You read output in order to write it to file - there's a more direct way (below)
  4. Regular split is not good for complex lines - use shlex.split
def exec_command_store_out(command_line, out_file):
   cmd_process = subprocess.Popen(shlex.split(command_line), stdout=open(out_file, 'a'))
   cmd_process.wait()
(May-30-2017, 10:57 PM)Mike Ru Wrote: [ -> ]
f = open('scanning.txt', 'r').readlines()
for x in f:
  if "Address" in x:
      address.append(x)
.......
  • Too many empty lines
  • Each line read from the file contains CR/LF at the end
  • There's more Pythonic way to handle file
  • You make 3 loops - one is enough
  • You would probably need zip to mix your data - but this is another story
  • If you are on Python 3 - you'll have to decode read lines, because process output is bytes, not UTF-8 coded strings.
with open(<file name>) as in_file:
   for line in in_file:
       line = line.strip()
       if "Address" in line :
           address.append(line )
       elif "Channel" in line:
           channel.append(line )
...............
Why are you joining and splitting your lines - beats me Naughty
Just using a regex is sometimes shorter, but not always good.

import re

address_regex = re.compile(r'Address: ([A-Z0-9:]+)')
channel_regex = re.compile(r'Channel:(\d+)')
essid_regex = re.compile(r'ESSID:"(.+)"')

addresses = address_regex.findall(var)
channels = channel_regex.findall(var)
essids = essid_regex.findall(var)

####
# independent how you've parsed the data, you can stick the lists together with zip.
# a = [1,2,3]
# b = [4,5,6]
# c = [7,8,9]
# list(zip(a, b, c))
# => [(1, 4, 7), (2, 5, 8), (3, 6, 9)]
# it's similar to the transpose function in excel, but more powerful
# In Python 3 zip is a lazy evaluated iterator.
# It gives you also the ability to do things, which normally doesn't fit complete in memory

# now make the data structure in the order you want:
access_points_list = list(zip(essids, addresses, channels))
# or as a dict, where the ssids are the keys:
access_points_dict = {elements[0]: elements[1:] for elements in zip(essids, addresses, channels)}
# or with address as key:
access_points_dict = {elements[0]: elements[1:] for elements in zip(addresses, essids, channels)}
# or a list with a nested dict:
access_points_list_with_dicts = [{'essid': elements[0], 'address': elements[1], 'channel': int(elements[2])} for elements in access_points_list]
First you run your program, then you parse the output of it without saving it to disk, then you transform the data and finally you can write the data to disk.
(May-31-2017, 12:19 PM)DeaD_EyE Wrote: [ -> ]
...{elements[0]: elements[1:] for elements in zip(...)}
Well, then, why not
{key: elements for key, *elements in zip(...)}
which is Python3-ic?

And I am not sure that OP is ready for all that - he's still pretty much struggling with the basics
Pages: 1 2