Python Forum
how to detect \x in string so it can be removed - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: how to detect \x in string so it can be removed (/thread-3672.html)



how to detect \x in string so it can be removed - azimmermann - Jun-12-2017

I have some garbled data that has random \x values in the string. These should be skipped for the import to work correctly
I am trying to write code to skip them, but can't figure it out.
Sample code is below

I can get the code to find 764 in the second index without issue, (it returns a true), but I don't get a true out of the second if statement, when I would like to. How do I deal with the special characters in a case like this?


somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778', '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.787', '3.785', '3.802', '3.782']
if any("764" in s for s in somedata):
    print 'true'
    #does print true
if any(r"\x" in s for s in somedata):
    print 'true'
#does not print true but \x is in somedata



RE: how to detect \x in string so it can be removed - wavic - Jun-12-2017

>>> somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778',
...  '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.
... 787', '3.785', '3.802', '3.782']

>>> if any(r'\x'):
...     print('True')
True



RE: how to detect \x in string so it can be removed - buran - Jun-12-2017

@azimmermann I don't know what your data are/represent, but \x is escape sequance to denote hex values. At the same time you search for raw r'\x' string. The two are not the same and you cannot search for \x because you will get ValueError: Invalid \x escape

@wavix r'\x' will always evaluate True and so will any(r'\x'), because r'\x' is non-empty string


RE: how to detect \x in string so it can be removed - wavic - Jun-12-2017

You are right.
I've never used any() before. I just saw the documentation.


RE: how to detect \x in string so it can be removed - nilamo - Jul-05-2017

(Jun-12-2017, 10:08 PM)wavic Wrote: I've never used any() before. I just saw the documentation.
any() checks if any of the values are True, and returns the first time it spots a true value (that's important, because if you pass it a generator, it won't cause the whole sequence to be iterated over).

It's basically this:
def any(items):
   for item in items:
       if item:
           return True
   return False



RE: how to detect \x in string so it can be removed - DeaD_EyE - Jul-05-2017

The sequence #5\x029.764 represents: #5__STX__9.764.
print '#5\x029.764'
Output:
#59.764
You should read this: https://docs.python.org/2.0/ref/strings.html
'\x029' is not octal, so Python interprets '\x02', which is hexadecimal representation.
Here you'll find all ASCII Codes: http://www.asciitable.com/

If you use the raw string instead, Python won't interpret the escape sequences:

print r'#5\x029.764'
Output:
#5\x029.764
I guess you want to filter non printable sequences.
import string
new_list = [filter(lambda e: e in string.printable, st) for st in somedata]
# or as generator expression
filter_non_printable = (filter(lambda e: e in string.printable, st) for st in somedata)
If you want to clean your data from all, except digits and the decimal point:

import string
import pprint


allowed_chars = string.digits + '.'
new_list = [filter(lambda e: e in allowed_chars, st) for st in somedata]
pprint.pprint(new_list)
Output:
By the way, use Python 3.x. It's much cleaner code and makes more fun.