Regex help for newbie

mcmpdx · Jul-01-2019, 12:15 AM

Hi,
New to this site and to python. As a first project, I am trying to extract entire line entries via regex from a text file (hosts.txt). Based on matches, I want to include those matches into ini-type sections of a flat text file, 'final_host'. So I am trying to put 'cars' matches under [cars], 'trucks' matches under [trucks], etc.

Here is an example hosts.txt file:

car1
car2
truck1
truck12
truck13

Here is my code:

import re

source = open("hosts", "r") 
car = re.compile(".*car\S+")
truck = re.compile(".*truck\S+")

for line in source:
  car_result = car.findall(line)
  truck_result = truck.findall(line)
source.close()

with open('final_host', 'w') as y:
  y.write("[car]\n")

  for x in car_result:
    print(x)

  y.write("[truck]\n")

  for x in truck_result:
    print(x)

y.close()

But all I'm getting for results are the ini headers. Is there an easier way to regex?

[car]
[truck]

OS: Ubuntu
Python: 2.7.15

Thank you in advance for any pointers.

**scidam** · Jul-01-2019, 01:07 AM

You need to add from __future__ import print_function and pass file keyword to the print function, e.g. print(x, file=y) or use y.write('{}\n'.format(x)). This should work, but not tested.
Why did you still using Python 2.x? If you have digits after car of truck words, it would be better to use a regexp something like this re.compile("car\d+").

mcmpdx · Jul-01-2019, 02:35 AM

Hi scidam,
Thanks for the quick response. As to why I'm using python 2.x... Mostly my impetus is that I use Ansible a lot so I wanted to be able to take advantage of some of the advanced features like writing my own module, Jinja2, etc. And as of now, python 3 is not fully supported for most of the features.

I did try your suggestions and each combination but am getting the same results as before my post. Also, I used

re.compile("car\S+")

because my hosts file was just an example and the characters following the match may be alpha, digit, etc.

**perfringo** · Jul-01-2019, 06:54 AM

It seems to me that if you are looking for certain strings it will be easier without regex

I am not familiar with Python 2 syntax therefore I use Python 3 code:

with open('cars.txt', 'r') as source:
    cars = ['[cars]\n']
    trucks = ['[trucks]\n']
    for row in source:
        if 'car' in row:
            cars.append(row)
        elif 'truck' in row:
            trucks.append(row)

with open('cars_result.txt', 'w') as filtered:
    print(*cars, *trucks, file=filtered)

cars_result.txt will look like:

Output:[cars]
 car1
 car2
 [trucks]
 truck1
 truck12
 truck13

Ansible is not my cup of tea but quick check of documentation revealed statements about support of Python 3.

In your code you use old-style open and 'with open(...'. What is the reason of such mix?

mcmpdx · Jul-01-2019, 04:50 PM

Thanks perfringo, this works for my purposes!

There is support for Python 3 in Ansible but not entirely and it's a work in progress.

As for why I'm using 'with open', again I'm a newbie. This is really my first project, so I am just cobbling together what I can. I am happy to hear about an alternative to 'with open'.

Thanks again!

***ichabod801*** · Jul-01-2019, 06:17 PM

Note that there are only six months of support left for Python 2.7.

mcmpdx · Jul-01-2019, 10:21 PM

Thank you, yes, I will be branching out and learning 3.x in parallel.

Regex help for newbie

User Panel Messages

Announcements