Python Forum

I am trying to find a string in a text file and get an error on the first find statement.

            xrefstart=irec.find("\x - \xo ")
            if xrefstart > 0:
                  #find the \xt which is the start of the xref
                  xstar=irec.find("x*")
                  xref = "<RX>"+irec[chr+10:xstar]+"<Rx>"
                  # replace everything up to the "x* " with xref
                  regexp = "\x - \xo .*?\x* "
                  re.sub(regexp, xref, irec)

The error reads:
(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape

How do I correct this?
"irec" contains:
\v 1 \x - \xo 1:1 \xt Ps 90:2; Jes 40:21-22; Joh 1:1-3; Hand 17:24; Kol 1:16-17; Heb 1:10; 11:3\x*In die begin het God die hemel en die aarde geskep.

Hi,

I guess something went wrong before. \x marks the start of a hex sequence and Python expacts that two characters from the 0-9a-h range follow. Which is not that case in your code.

However, the questions is why your string includes hex squences anyway. I looks like an extract from a bible or something like that and they rarely used hex sequences in their texts back in the days. I guess when reading / getting the data which lead to the string something was wrong with encodng applied?

Gruß, noisefloor

What is irec? Is it str or bytes? If it is bytes, your search pattern must also be bytes. Either way you should use a raw literal for the pattern to prevent "\x" being interpreted as an escape sequence marking the start of a hexadecimal number.

xrefstart=irec.find(r"\x - \xo ")

WJSwan

noisefloor

deanhystad