Python Forum

Full Version: Additional slashes being added to string
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have the following code, where I have the variable domain_with_escapes the way I want it. When I add it as part of a value in a dictionary, there are additional slashes that are getting added, and I can't figure out why.

#!/usr/local/bin/python3.4

import re

domain_with_escapes='aw\.me\.org'
print("Domain with escapes is: " + str(domain_with_escapes) + '\r\r')
d0 = "hw"
s0 = "site"
d1 = "cmc"
s1 = "sitename"

key = d0 + " in " + s0
match_string = '.*' + str(d1) + '.*' + str(s1) + '.*' + domain_with_escapes
print("match_string is: " + str(match_string) + '\r\r' )

params_for = dict()
params_for[key] = {'name~' :  str(match_string) }
print("Dict params_for is: " + str(params_for) + '\r\r')

quit()
When I run it, I get this:

$ ./test.py
Domain with escapes is: aw\.me\.org
match_string is: .*cmc.*sitename.*aw\.me\.org
Dict params_for is: {'hw in site': {'name~': '.*cmc.*sitename.*aw\\.me\\.org'}}
Any thoughts on why the additional slashes are getting added?
use an r before the string like:

domain_with_escapes= r'aw\.me\.org'
print(domain_with_escapes)
result:
Output:
aw\.me\.org
if you're using an interactive python, it may appear that there are two slashes,
but when writing to a file, or printing, they are just escaped,
OK, thanks. So I take it that in certain circumstances, i.e. if it's printed or writing to a file, there may appear two slashes, but there is really only one in memory?

Most of all, I'm trying to figure out how this would differ from the equivalent string the following dictionary:

params_for = {
    "hw1 in location1" : {'name~' : '.*cmc.*site1.*\.aw\.me\.org'},
    "hw2 in location2" : {'name~' : '.*drc.*site1.*\.aw\.me\.org'},
}
I ask because when I pass this data structure for the Infoblox API to search, I get the expected results, i.e. I see data I expect to see. Moreover, I didn't have to set this as a raw string, and it still returned the desired results. However, when I use the data structure as earlier (the only difference being the double slashes), I see no records returned. So there must be something about how '.*cmc.*site1.*\.aw\.me\.org' differs from the contents of the match_string variable in my code earlier.
I'm not sure what the full rule is, but it does show the 'escape' sometimes.
I never really investigated it,or thought about it in depth because by nature
(probably so many years programming) that I just seem to get ir right
So, anybody know what the full rule is?

It seems that whenever the string is assigned as an element in a dictionary, python adds an extra slash to '\.' to make '\\.'. I know this can't be just a representation of the hash, as the Infoblox API is seeing it differently, meaning that python is passing it different input as a request.

I changed the code to the following, and still get the same thing, although I don't see the same behavior when assigning to a variable, as opposed to an element in a dictionary. Anything special about assigning to a dictionary that would trigger the additional slash to be added?

domain_with_escapes=r'aw\.me.org'
print("Domain with escapes is: " + str(domain_with_escapes) + '\r\r')
d0 = "hw"
s0 = "site"
d1 = "cmc"
s1 = "sitename"

key = d0 + " in " + s0
match_string = '.*' + str(d1) + '.*' + str(s1) + '.*' + str(domain_with_escapes)
tmp_dict = dict()
tmp_dict['name~'] = str(match_string)
var1 = match_string

print("Dict tmp_dict is: " + str(tmp_dict) + '\r\r')

print("match_string is: " + str(match_string) + '\r\r' )
print("match_string without str is: " + match_string + '\r\r' )
print("var1 is: " + match_string + '\r\r' )

params_for = dict()
#params_for[key] = {'name~' :  str(match_string) }
params_for[key] = tmp_dict
print("Dict params_for is: " + str(params_for) + '\r\r')
I run it and get the same results.

Domain with escapes is: aw\.me.org
Dict tmp_dict is: {'name~': '.*cmc.*sitename.*aw\\.me.org'}
match_string is: .*cmc.*sitename.*aw\.me.org
match_string without str is: .*cmc.*sitename.*aw\.me.org
var1 is: .*cmc.*sitename.*aw\.me.org
Dict params_for is: {'hw in site': {'name~': '.*cmc.*sitename.*aw\\.me.org'}}
I do know that '\' is special for line feed, carriage return, form feed, etc. and
was in use when I started programming back in the 1960's. Therefore needs to
be escaped itself.

This might be helpful: https://stackoverflow.com/questions/4020...-in-python
Yes, I know the slash is a special character, and needs to be escaped. However, I'm not wanting to escape it; rather, I'm using it to escape the period in the domain name that it precedes. What's happening here (so far as I can tell) is that when I put in that slash to escape the period, the python interpreter is thinking I need to escape my slash, which makes the regular expression wrong. As the regexp is wrong, I'm not getting the search results I need.

That said, I'm putting in a workaround for now by putting in wildcards for the period, i.e. instead of \.aw\.me\.org, I've changed my code to use .*aw.*me.*org. Of course, that's more resource intensive than it has to be, but it seems the only way I can get this to work now.
why not use raw string as suggested by Larz in post#2?
The backslash is the escape character.  If you want to use a backslash, it therefore needs to be escaped... by a backslash.  Which is why almost all non-trival regexes are raw strings. 

So there's three options:
1) escape the backslashes
2) use a raw string
3) don't use a backslash at all, and instead let the regex module do the escaping.  If you don't want to use a raw string, this might be the best option for you.

>>> "spam\\.eggs"
'spam\\.eggs'
>>> r"spam\.eggs"
'spam\\.eggs'
>>> re.escape("spam.eggs")
'spam\\.eggs'