Searching a text file to find words matching a pattern

***snippsat*** · (This post was last modified: Nov-07-2017, 07:28 PM by snippsat.)

(Nov-07-2017, 06:37 PM)Micael Wrote: It's in Swedish so there is of course åäö in the file and that's a problem as well.

A lot have change regard Unicode,it was one the biggest changes moving to Python 3(as mention bye @heiner55 you should use Python 3).
In Python 3 open() has build in encoding parameter.
So the simple rule is to keep it UTF-8 in and out when reading a file.
Inside Python 3 is all strings sequences of Unicode character,if not encode in or Python 3 do not not recognize encoding it will be bytes (b'hello').
Python 3 will not guess as Python 2 do.

So if borrow code from @heiner55 it look like this:

import re

with open('ss.txt', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if re.match(r"h...g..", line) and len(line)==7:
            print(line)

There is no need for # -*- coding: utf-8 -*- in Python 3,because UTF-8 is default.

In Python 2 it would look like this,same rule UTF-8 in and out.
But has to use a library io or codecs and # -*- coding: utf-8 -*- because Python 2 has ASCII default encoding.

# -*- coding: utf-8 -*-
import re
import io

with io.open('ss.txt', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if re.match(r"h...g..", line) and len(line)==7:
            print(line)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Matching string from a file	tester_V	5	655	Mar-05-2024, 05:46 AM Last Post: Danishhafeez
	find and group similar words with re?	cartonics	4	891	Oct-27-2023, 05:36 PM Last Post: deanhystad
	Form that puts diacritics on the words in the text	Melcu54	13	1,845	Aug-22-2023, 07:07 AM Last Post: Pedroski55
	FileNotFoundError: [WinError 2] The system cannot find the file specified	NewBiee	2	1,806	Jul-31-2023, 11:42 AM Last Post: deanhystad
	splitting file into multiple files by searching for string	AlphaInc	2	1,112	Jul-01-2023, 10:35 PM Last Post: Pedroski55
	Cannot find py credentials file	standenman	5	1,811	Feb-25-2023, 08:30 PM Last Post: Jeff900
	selenium can't find a file in my desk ?	SouAmego22	0	835	Feb-14-2023, 03:21 PM Last Post: SouAmego22
	Pypdf2 will not find text	standenman	2	1,065	Feb-03-2023, 10:52 PM Last Post: standenman
	Need to compare the Excel file name with a directory text file.	veeran1991	1	1,273	Dec-15-2022, 04:32 PM Last Post: Larz60+
	Find (each) element from a list in a file	tester_V	3	1,380	Nov-15-2022, 08:40 PM Last Post: tester_V

Searching a text file to find words matching a pattern

User Panel Messages

Announcements