Python Forum
Shorten this List Comprehension - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Shorten this List Comprehension (/thread-394.html)



Shorten this List Comprehension - ATXpython - Oct-09-2016

Hello,

I currently have a script that pulls data from rows of a spreadsheet, appends that to a list, then writes that list to a csv file.
Recently, I discovered that I need to do some "clean-up" on the data - eliminating special characters from the content (should they be there).

Some pseudo code to give you an idea of how I achieved this:

# define the bad special characters
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'

# create a list based on content in the spreadsheet (starting at a specific row in the document)
my_list = [(data.cell_value(row+10,1)) for row in range(data.nrows-10)]

# create a new list by cleaning up the content of the previously made list
new_list = [(''.join(c for c in i if c not in invalid_char)) for i in my_list]
Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.


RE: Shorten this List Comprehension - ichabod801 - Oct-09-2016

my_list = [''.join(char for char in data.cell_value(row + 10, 1) if char not in invalid_char) for row in range(data.nrows - 10)]
You could probably speed that up by using the translate method of the string instead of the join on the generator comprehension.


RE: Shorten this List Comprehension - ATXpython - Oct-09-2016

ichabod801 -

Thanks for taking the time to reply!

I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length.  While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.

Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!


RE: Shorten this List Comprehension - snippsat - Oct-09-2016

(Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.
No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.

There are some different way like translate as mention and i can show one with regex.
>>> import re
>>> lst = ['hello?', 'wo+rld@', 'toge?the]']
>>> [re.sub(r'[?@\]+.,]', '', item) for item in lst]
['hello', 'world', 'together']
Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
It's to long when you get over 100 if you ask me,around 90'ish is okay.


RE: Shorten this List Comprehension - ichabod801 - Oct-09-2016

(Oct-09-2016, 01:42 AM)ATXpython Wrote: ichabod801 -

Thanks for taking the time to reply!

I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length.  While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.

Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!

I think the line limit length is for readability. However, when you get to heavily indented code it gets in the way of readability because it makes the lines too short. Of course, you have to ask yourself if you can make your code less indented. I generally write my code with a line length of 108 characters, although I keep docstrings to 79 characters so they are readable in the shell. I also find list comprehensions hard to read in general, and for a complicated one I will often just make a for loop so it is clearer.

For translate, you first have to make a translation table with the maketrans method. It takes three arguments: a string of characters of characters in the original string, a string of characters to replace them with (in the same order), and a string of characters to delete. You can also use one argument that is a dictionary, see the documentation for details. So for your example:

# define the bad special characters
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'
trans = ''.maketrans('', '', invalid_char)
 
# create a list based on content in the spreadsheet (starting at a specific row in the document)
my_list = [(data.cell_value(row+10,1)) for row in range(data.nrows-10)]
 
# create a new list by cleaning up the content of the previously made list
new_list = [i.translate(trans) for i in my_list]
The above works in Python 3.x. For Python 2.x, translate was done through the string module, see the documentation for that.

Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method.


RE: Shorten this List Comprehension - snippsat - Oct-09-2016

(Oct-09-2016, 12:55 PM)ichabod801 Wrote: Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method.
Yes translate is the fastest as expected,it win bye ca 10 sec(to other solution's) running 1000000 times with timeit.
list used to test:
lst = ['hello?', 'wo+rld@', 'toge?the'] * 5
But all optimization can be trowed out the window if using PyPy.
Regex version run 3 times faster in PyPy,than translate version run trough Python 3.4.
Did't rewrite translate for Python 2 to test with PyPy,
but when doing this before PyPy smooth out time difference and it become small.


RE: Shorten this List Comprehension - Skaperen - Oct-10-2016

(Oct-09-2016, 08:25 AM)snippsat Wrote:
(Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.
No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.

There are some different way like translate as mention and i can show one with regex.
>>> import re
>>> lst = ['hello?', 'wo+rld@', 'toge?the]']
>>> [re.sub(r'[?@\]+.,]', '', item) for item in lst]
['hello', 'world', 'together']
Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
It's to long when you get over 100 if you ask me,around 90'ish is okay.

it depends on typical terminal size.  even with big screen terminal programs usually default to to 80 for the width.  many do not "fix" wrapping issues so limiting at 79 on these is better.  people often leave term programs at the defaults.

i have changed mine to nearly full screen (166 wide, 46 lines, 14 pt font, on 1920x1080).  but this also limits me from doing 2 term windows side by side.   if i shink 2 term windows they end up nearly at 80 wide.  so, despite some of my code being very wide (i have some that exceeds 1000's) i suggest making code fit in 79 .... not even 90-ish ... just 79.


RE: Shorten this List Comprehension - wavic - Oct-10-2016

Readability shouldn't be a problem even with a long list comprehension. It can be wrapped. I doing it often when the code is too long or for readability.