Shorten this List Comprehension

ATXpython · Oct-09-2016, 01:10 AM

Hello,

I currently have a script that pulls data from rows of a spreadsheet, appends that to a list, then writes that list to a csv file.
Recently, I discovered that I need to do some "clean-up" on the data - eliminating special characters from the content (should they be there).

Some pseudo code to give you an idea of how I achieved this:

        
              # define the bad special characters
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'
 
# create a list based on content in the spreadsheet (starting at a specific row in the document)
my_list = [(data.cell_value(row+10,1)) for row in range(data.nrows-10)]
 
# create a new list by cleaning up the content of the previously made list
new_list = [(''.join(c for c in i if c not in invalid_char)) for i in my_list]

Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.

***ichabod801*** · Oct-09-2016, 01:21 AM

        
              my_list = [''.join(char for char in data.cell_value(row + 10, 1) if char not in invalid_char) for row in range(data.nrows - 10)]

You could probably speed that up by using the translate method of the string instead of the join on the generator comprehension.

ATXpython · Oct-09-2016, 01:42 AM

ichabod801 -

Thanks for taking the time to reply!

I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length. While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.

Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!

***snippsat*** · (This post was last modified: Oct-09-2016, 02:04 PM by snippsat.)

(Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.

No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.

There are some different way like translate as mention and i can show one with regex.

        
              >>> import re
>>> lst = ['hello?', 'wo+rld@', 'toge?the]']
>>> [re.sub(r'[?@\]+.,]', '', item) for item in lst]
['hello', 'world', 'together']

Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?

It's to long when you get over 100 if you ask me,around 90'ish is okay.

***ichabod801*** · Oct-09-2016, 12:55 PM

(Oct-09-2016, 01:42 AM)ATXpython Wrote: ichabod801 -

Thanks for taking the time to reply!

I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length. While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.

Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!

I think the line limit length is for readability. However, when you get to heavily indented code it gets in the way of readability because it makes the lines too short. Of course, you have to ask yourself if you can make your code less indented. I generally write my code with a line length of 108 characters, although I keep docstrings to 79 characters so they are readable in the shell. I also find list comprehensions hard to read in general, and for a complicated one I will often just make a for loop so it is clearer.

For translate, you first have to make a translation table with the maketrans method. It takes three arguments: a string of characters of characters in the original string, a string of characters to replace them with (in the same order), and a string of characters to delete. You can also use one argument that is a dictionary, see the documentation for details. So for your example:

        
              # define the bad special characters
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'
trans = ''.maketrans('', '', invalid_char)
  
# create a list based on content in the spreadsheet (starting at a specific row in the document)
my_list = [(data.cell_value(row+10,1)) for row in range(data.nrows-10)]
  
# create a new list by cleaning up the content of the previously made list
new_list = [i.translate(trans) for i in my_list]

The above works in Python 3.x. For Python 2.x, translate was done through the string module, see the documentation for that.

Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method.

***snippsat*** · (This post was last modified: Oct-09-2016, 08:16 PM by snippsat.)

(Oct-09-2016, 12:55 PM)ichabod801 Wrote: Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method.

Yes translate is the fastest as expected,it win bye ca 10 sec(to other solution's) running 1000000 times with timeit.
list used to test:

        
              lst = ['hello?', 'wo+rld@', 'toge?the'] * 5

But all optimization can be trowed out the window if using PyPy.
Regex version run 3 times faster in PyPy,than translate version run trough Python 3.4.
Did't rewrite translate for Python 2 to test with PyPy,
but when doing this before PyPy smooth out time difference and it become small.

Skaperen · Oct-10-2016, 07:09 AM

(Oct-09-2016, 08:25 AM)snippsat Wrote:
(Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.
No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.

There are some different way like translate as mention and i can show one with regex.

1
2
3
4

>>> import re
>>> lst = ['hello?', 'wo+rld@', 'toge?the]']
>>> [re.sub(r'[?@\]+.,]', '', item) for item in lst]
['hello', 'world', 'together']

Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
It's to long when you get over 100 if you ask me,around 90'ish is okay.

it depends on typical terminal size. even with big screen terminal programs usually default to to 80 for the width. many do not "fix" wrapping issues so limiting at 79 on these is better. people often leave term programs at the defaults.

i have changed mine to nearly full screen (166 wide, 46 lines, 14 pt font, on 1920x1080). but this also limits me from doing 2 term windows side by side. if i shink 2 term windows they end up nearly at 80 wide. so, despite some of my code being very wide (i have some that exceeds 1000's) i suggest making code fit in 79 .... not even 90-ish ... just 79.

wavic · Oct-10-2016, 07:50 AM

Readability shouldn't be a problem even with a long list comprehension. It can be wrapped. I doing it often when the code is too long or for readability.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	If I need to do regex replacement 27000 times, how to shorten it?	tatahuft	4	876	Dec-26-2024, 01:51 AM Last Post: deanhystad
	List comprehension not working right	Cris9855	3	1,014	Nov-04-2024, 03:46 PM Last Post: DeaD_EyE
	i want to shorten a working section	pizzakafz	15	2,923	Aug-23-2024, 11:56 AM Last Post: deanhystad
	Problem with List Comprehension in Python	laurawoods	3	1,175	Aug-12-2024, 06:26 AM Last Post: Pedroski55
	List Comprehension Issue	johnywhy	5	2,000	Jan-14-2024, 07:58 AM Last Post: Pedroski55
	mypy unable to analyse types of tuple elements in a list comprehension	tomciodev	1	1,701	Oct-17-2023, 09:46 AM Last Post: tomciodev
	Using list comprehension with 'yield' in function	tester_V	5	3,680	Apr-02-2023, 06:31 PM Last Post: tester_V
	python multiple try except block in my code -- can we shorten code	mg24	10	15,511	Nov-10-2022, 12:48 PM Last Post: DeaD_EyE
	list comprehension	3lnyn0	4	2,462	Jul-12-2022, 09:49 AM Last Post: DeaD_EyE
	Want to shorten the python code	shantanu97	3	2,119	Apr-25-2022, 01:12 PM Last Post: snippsat

Shorten this List Comprehension

User Panel Messages

Announcements