Posts: 1
Threads: 1
Joined: Jun 2017
I have some garbled data that has random \x values in the string. These should be skipped for the import to work correctly
I am trying to write code to skip them, but can't figure it out.
Sample code is below
I can get the code to find 764 in the second index without issue, (it returns a true), but I don't get a true out of the second if statement, when I would like to. How do I deal with the special characters in a case like this?
somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778', '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.787', '3.785', '3.802', '3.782']
if any("764" in s for s in somedata):
print 'true'
#does print true
if any(r"\x" in s for s in somedata):
print 'true'
#does not print true but \x is in somedata
Posts: 2,953
Threads: 48
Joined: Sep 2016
>>> somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778',
... '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.
... 787', '3.785', '3.802', '3.782']
>>> if any(r'\x'):
... print('True')
True
Posts: 8,170
Threads: 160
Joined: Sep 2016
Jun-12-2017, 07:19 PM
(This post was last modified: Jun-12-2017, 07:30 PM by buran.)
@ azimmermann I don't know what your data are/represent, but \x is escape sequance to denote hex values. At the same time you search for raw r'\x' string. The two are not the same and you cannot search for \x because you will get ValueError: Invalid \x escape
@wavix r'\x' will always evaluate True and so will any(r'\x') , because r'\x' is non-empty string
Posts: 2,953
Threads: 48
Joined: Sep 2016
You are right.
I've never used any() before. I just saw the documentation.
Posts: 3,458
Threads: 101
Joined: Sep 2016
(Jun-12-2017, 10:08 PM)wavic Wrote: I've never used any() before. I just saw the documentation. any() checks if any of the values are True, and returns the first time it spots a true value (that's important, because if you pass it a generator, it won't cause the whole sequence to be iterated over).
It's basically this:
def any(items):
for item in items:
if item:
return True
return False
Posts: 2,130
Threads: 11
Joined: May 2017
Jul-05-2017, 10:46 PM
(This post was last modified: Jul-05-2017, 10:46 PM by DeaD_EyE.)
The sequence #5\x029.764 represents: #5__STX__9.764 .
print '#5\x029.764' Output: #59.764
You should read this: https://docs.python.org/2.0/ref/strings.html
'\x029' is not octal, so Python interprets '\x02', which is hexadecimal representation.
Here you'll find all ASCII Codes: http://www.asciitable.com/
If you use the raw string instead, Python won't interpret the escape sequences:
print r'#5\x029.764' Output: #5\x029.764
I guess you want to filter non printable sequences.
import string
new_list = [filter(lambda e: e in string.printable, st) for st in somedata]
# or as generator expression
filter_non_printable = (filter(lambda e: e in string.printable, st) for st in somedata) If you want to clean your data from all, except digits and the decimal point:
import string
import pprint
allowed_chars = string.digits + '.'
new_list = [filter(lambda e: e in allowed_chars, st) for st in somedata]
pprint.pprint(new_list) Output: Output: ['5',
'59.764',
'3.768',
'3.757',
'3.776',
'3.787',
'3.778',
'3.788',
'3.760',
'3.777',
'3.791',
'3.792',
'3.791',
'3.796',
'3.798',
'3.787',
'3.785',
'3.802',
'3.782']
By the way, use Python 3.x. It's much cleaner code and makes more fun.
|