Python Forum
how to detect \x in string so it can be removed
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to detect \x in string so it can be removed
#1
I have some garbled data that has random \x values in the string. These should be skipped for the import to work correctly
I am trying to write code to skip them, but can't figure it out.
Sample code is below

I can get the code to find 764 in the second index without issue, (it returns a true), but I don't get a true out of the second if statement, when I would like to. How do I deal with the special characters in a case like this?


somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778', '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.787', '3.785', '3.802', '3.782']
if any("764" in s for s in somedata):
    print 'true'
    #does print true
if any(r"\x" in s for s in somedata):
    print 'true'
#does not print true but \x is in somedata
Reply
#2
>>> somedata = ['5', '#5\x029.764', '3.768', '3.757', '3.776', '3.787', '3.778',
...  '3.788', '3.760', '3.777', '3.791', '3.792', '3.791', '3.796', '3.798', '3.
... 787', '3.785', '3.802', '3.782']

>>> if any(r'\x'):
...     print('True')
True
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
@azimmermann I don't know what your data are/represent, but \x is escape sequance to denote hex values. At the same time you search for raw r'\x' string. The two are not the same and you cannot search for \x because you will get ValueError: Invalid \x escape

@wavix r'\x' will always evaluate True and so will any(r'\x'), because r'\x' is non-empty string
Reply
#4
You are right.
I've never used any() before. I just saw the documentation.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#5
(Jun-12-2017, 10:08 PM)wavic Wrote: I've never used any() before. I just saw the documentation.
any() checks if any of the values are True, and returns the first time it spots a true value (that's important, because if you pass it a generator, it won't cause the whole sequence to be iterated over).

It's basically this:
def any(items):
   for item in items:
       if item:
           return True
   return False
Reply
#6
The sequence #5\x029.764 represents: #5__STX__9.764.
print '#5\x029.764'
Output:
#59.764
You should read this: https://docs.python.org/2.0/ref/strings.html
'\x029' is not octal, so Python interprets '\x02', which is hexadecimal representation.
Here you'll find all ASCII Codes: http://www.asciitable.com/

If you use the raw string instead, Python won't interpret the escape sequences:

print r'#5\x029.764'
Output:
#5\x029.764
I guess you want to filter non printable sequences.
import string
new_list = [filter(lambda e: e in string.printable, st) for st in somedata]
# or as generator expression
filter_non_printable = (filter(lambda e: e in string.printable, st) for st in somedata)
If you want to clean your data from all, except digits and the decimal point:

import string
import pprint


allowed_chars = string.digits + '.'
new_list = [filter(lambda e: e in allowed_chars, st) for st in somedata]
pprint.pprint(new_list)
Output:
By the way, use Python 3.x. It's much cleaner code and makes more fun.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Openpyxl: Excel formula & condition formatting removed JaneTan 0 3,561 Sep-25-2020, 07:02 AM
Last Post: JaneTan
  Why is one duplicate not removed? Emekadavid 4 2,284 Jun-09-2020, 06:34 PM
Last Post: perfringo
  How to check if video has been deleted or removed in youtube using python Prince_Bhatia 14 11,584 Feb-21-2020, 04:33 AM
Last Post: jehoshua

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020