Posts: 24
Threads: 8
Joined: Sep 2016
Hello,
I currently have a script that pulls data from rows of a spreadsheet, appends that to a list, then writes that list to a csv file.
Recently, I discovered that I need to do some "clean-up" on the data - eliminating special characters from the content (should they be there).
Some pseudo code to give you an idea of how I achieved this:
1 2 3 4 5 6 7 8 |
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'
my_list = [(data.cell_value(row + 10 , 1 )) for row in range (data.nrows - 10 )]
new_list = [(''.join(c for c in i if c not in invalid_char)) for i in my_list]
|
Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.
Posts: 4,220
Threads: 97
Joined: Sep 2016
1 |
my_list = [''.join(char for char in data.cell_value(row + 10 , 1 ) if char not in invalid_char) for row in range (data.nrows - 10 )]
|
You could probably speed that up by using the translate method of the string instead of the join on the generator comprehension.
Posts: 24
Threads: 8
Joined: Sep 2016
ichabod801 -
Thanks for taking the time to reply!
I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length. While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.
Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!
Posts: 7,324
Threads: 123
Joined: Sep 2016
Oct-09-2016, 08:25 AM
(This post was last modified: Oct-09-2016, 02:04 PM by snippsat.)
(Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it. No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.
There are some different way like translate as mention and i can show one with regex.
1 2 3 4 |
>>> import re
>>> lst = [ 'hello?' , 'wo+rld@' , 'toge?the]' ]
>>> [re.sub(r '[?@\]+.,]' , '', item) for item in lst]
[ 'hello' , 'world' , 'together' ]
|
Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
It's to long when you get over 100 if you ask me,around 90'ish is okay.
Posts: 4,220
Threads: 97
Joined: Sep 2016
(Oct-09-2016, 01:42 AM)ATXpython Wrote: ichabod801 -
Thanks for taking the time to reply!
I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length. While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.
Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!
I think the line limit length is for readability. However, when you get to heavily indented code it gets in the way of readability because it makes the lines too short. Of course, you have to ask yourself if you can make your code less indented. I generally write my code with a line length of 108 characters, although I keep docstrings to 79 characters so they are readable in the shell. I also find list comprehensions hard to read in general, and for a complicated one I will often just make a for loop so it is clearer.
For translate, you first have to make a translation table with the maketrans method. It takes three arguments: a string of characters of characters in the original string, a string of characters to replace them with (in the same order), and a string of characters to delete. You can also use one argument that is a dictionary, see the documentation for details. So for your example:
1 2 3 4 5 6 7 8 9 |
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'
trans = ' '.maketrans(' ', ' ', invalid_char)
my_list = [(data.cell_value(row + 10 , 1 )) for row in range (data.nrows - 10 )]
new_list = [i.translate(trans) for i in my_list]
|
The above works in Python 3.x. For Python 2.x, translate was done through the string module, see the documentation for that.
Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method.
Posts: 7,324
Threads: 123
Joined: Sep 2016
Oct-09-2016, 02:57 PM
(This post was last modified: Oct-09-2016, 08:16 PM by snippsat.)
(Oct-09-2016, 12:55 PM)ichabod801 Wrote: Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method. Yes translate is the fastest as expected,it win bye ca 10 sec(to other solution's) running 1000000 times with timeit.
list used to test:
1 |
lst = [ 'hello?' , 'wo+rld@' , 'toge?the' ] * 5
|
But all optimization can be trowed out the window if using PyPy.
Regex version run 3 times faster in PyPy,than translate version run trough Python 3.4.
Did't rewrite translate for Python 2 to test with PyPy,
but when doing this before PyPy smooth out time difference and it become small.
Posts: 4,653
Threads: 1,496
Joined: Sep 2016
(Oct-09-2016, 08:25 AM)snippsat Wrote: (Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it. No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.
There are some different way like translate as mention and i can show one with regex.
1 2 3 4 |
>>> import re
>>> lst = [ 'hello?' , 'wo+rld@' , 'toge?the]' ]
>>> [re.sub(r '[?@\]+.,]' , '', item) for item in lst]
[ 'hello' , 'world' , 'together' ]
|
Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
It's to long when you get over 100 if you ask me,around 90'ish is okay.
it depends on typical terminal size. even with big screen terminal programs usually default to to 80 for the width. many do not "fix" wrapping issues so limiting at 79 on these is better. people often leave term programs at the defaults.
i have changed mine to nearly full screen (166 wide, 46 lines, 14 pt font, on 1920x1080). but this also limits me from doing 2 term windows side by side. if i shink 2 term windows they end up nearly at 80 wide. so, despite some of my code being very wide (i have some that exceeds 1000's) i suggest making code fit in 79 .... not even 90-ish ... just 79.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 2,953
Threads: 48
Joined: Sep 2016
Readability shouldn't be a problem even with a long list comprehension. It can be wrapped. I doing it often when the code is too long or for readability.
|