(Nov-07-2017, 06:37 PM)Micael Wrote: It's in Swedish so there is of course åäö in the file and that's a problem as well.A lot have change regard Unicode,it was one the biggest changes moving to Python 3(as mention bye @heiner55 you should use Python 3).
In Python 3
open()
has build in encoding
parameter.So the simple rule is to keep it UTF-8 in and out when reading a file.
Inside Python 3 is all strings sequences of Unicode character,if not encode in or Python 3 do not not recognize encoding it will be bytes (b'hello').
Python 3 will not guess as Python 2 do.
So if borrow code from @heiner55 it look like this:
import re with open('ss.txt', encoding='utf-8') as f: for line in f: line = line.strip() if re.match(r"h...g..", line) and len(line)==7: print(line)There is no need for
# -*- coding: utf-8 -*-
in Python 3,because UTF-8 is default.In Python 2 it would look like this,same rule UTF-8 in and out.
But has to use a library io or codecs and
# -*- coding: utf-8 -*-
because Python 2 has ASCII default encoding.# -*- coding: utf-8 -*- import re import io with io.open('ss.txt', encoding='utf-8') as f: for line in f: line = line.strip() if re.match(r"h...g..", line) and len(line)==7: print(line)