how to detect \x in string so it can be removed

azimmermann · Jun-12-2017, 06:36 PM

I have some garbled data that has random \x values in the string. These should be skipped for the import to work correctly
I am trying to write code to skip them, but can't figure it out.
Sample code is below

I can get the code to find 764 in the second index without issue, (it returns a true), but I don't get a true out of the second if statement, when I would like to. How do I deal with the special characters in a case like this?

somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778', '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.787', '3.785', '3.802', '3.782']
if any("764" in s for s in somedata):
    print 'true'
    #does print true
if any(r"\x" in s for s in somedata):
    print 'true'
#does not print true but \x is in somedata

wavic · Jun-12-2017, 07:09 PM

>>> somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778',
...  '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.
... 787', '3.785', '3.802', '3.782']

>>> if any(r'\x'):
...     print('True')
True

**buran** · (This post was last modified: Jun-12-2017, 07:30 PM by buran.)

@azimmermann I don't know what your data are/represent, but \x is escape sequance to denote hex values. At the same time you search for raw r'\x' string. The two are not the same and you cannot search for \x because you will get ValueError: Invalid \x escape

@wavix r'\x' will always evaluate True and so will any(r'\x'), because r'\x' is non-empty string

wavic · Jun-12-2017, 10:08 PM

You are right.
I've never used any() before. I just saw the documentation.

**nilamo** · Jul-05-2017, 09:20 PM

(Jun-12-2017, 10:08 PM)wavic Wrote: I've never used any() before. I just saw the documentation.

any() checks if any of the values are True, and returns the first time it spots a true value (that's important, because if you pass it a generator, it won't cause the whole sequence to be iterated over).

It's basically this:

def any(items):
   for item in items:
       if item:
           return True
   return False

DeaD_EyE · (This post was last modified: Jul-05-2017, 10:46 PM by DeaD_EyE.)

The sequence #5\x029.764 represents: #5__STX__9.764.

print '#5\x029.764'

Output:
#59.764

You should read this: https://docs.python.org/2.0/ref/strings.html
'\x029' is not octal, so Python interprets '\x02', which is hexadecimal representation.
Here you'll find all ASCII Codes: http://www.asciitable.com/

If you use the raw string instead, Python won't interpret the escape sequences:

print r'#5\x029.764'

Output:
#5\x029.764

I guess you want to filter non printable sequences.

import string
new_list = [filter(lambda e: e in string.printable, st) for st in somedata]
# or as generator expression
filter_non_printable = (filter(lambda e: e in string.printable, st) for st in somedata)

If you want to clean your data from all, except digits and the decimal point:

import string
import pprint


allowed_chars = string.digits + '.'
new_list = [filter(lambda e: e in allowed_chars, st) for st in somedata]
pprint.pprint(new_list)

Output:

Hide/Show

Output:['5',
 '59.764',
 '3.768',
 '3.757',
 '3.776',
 '3.787',
 '3.778',
 '3.788',
 '3.760',
 '3.777',
 '3.791',
 '3.792',
 '3.791',
 '3.796',
 '3.798',
 '3.787',
 '3.785',
 '3.802',
 '3.782']

By the way, use Python 3.x. It's much cleaner code and makes more fun.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Script problem - Illegal access to removed OSM object	MarcPolo72	0	1,088	Jun-23-2024, 04:26 PM Last Post: MarcPolo72
	Openpyxl: Excel formula & condition formatting removed	JaneTan	0	5,319	Sep-25-2020, 07:02 AM Last Post: JaneTan
	Why is one duplicate not removed?	Emekadavid	4	3,379	Jun-09-2020, 06:34 PM Last Post: perfringo
	How to check if video has been deleted or removed in youtube using python	Prince_Bhatia	14	15,772	Feb-21-2020, 04:33 AM Last Post: jehoshua

how to detect \x in string so it can be removed

User Panel Messages

Announcements