Python Forum
Shorten this List Comprehension
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Shorten this List Comprehension
#1
Hello,

I currently have a script that pulls data from rows of a spreadsheet, appends that to a list, then writes that list to a csv file.
Recently, I discovered that I need to do some "clean-up" on the data - eliminating special characters from the content (should they be there).

Some pseudo code to give you an idea of how I achieved this:

# define the bad special characters
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'

# create a list based on content in the spreadsheet (starting at a specific row in the document)
my_list = [(data.cell_value(row+10,1)) for row in range(data.nrows-10)]

# create a new list by cleaning up the content of the previously made list
new_list = [(''.join(c for c in i if c not in invalid_char)) for i in my_list]
Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.
Reply
#2
my_list = [''.join(char for char in data.cell_value(row + 10, 1) if char not in invalid_char) for row in range(data.nrows - 10)]
You could probably speed that up by using the translate method of the string instead of the join on the generator comprehension.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
ichabod801 -

Thanks for taking the time to reply!

I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length.  While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.

Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!
Reply
#4
(Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.
No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.

There are some different way like translate as mention and i can show one with regex.
>>> import re
>>> lst = ['hello?', 'wo+rld@', 'toge?the]']
>>> [re.sub(r'[?@\]+.,]', '', item) for item in lst]
['hello', 'world', 'together']
Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
It's to long when you get over 100 if you ask me,around 90'ish is okay.
Reply
#5
(Oct-09-2016, 01:42 AM)ATXpython Wrote: ichabod801 -

Thanks for taking the time to reply!

I had something similar to your code, however I bailed on it because I thought the line was too long.
Your line is ~130 characters in length.  While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
I've heard people mention that the line-length rule is in place because the interpreter can yield unintended results on longer lines - on the other hand, I've also heard its just for readability.
I know this question is out of the scope of my original question - but I'd love to know the communities thoughts on this.

Also, I am not familiar with the translate method - if you could go into some more detail here, it would be greatly appreciated!

I think the line limit length is for readability. However, when you get to heavily indented code it gets in the way of readability because it makes the lines too short. Of course, you have to ask yourself if you can make your code less indented. I generally write my code with a line length of 108 characters, although I keep docstrings to 79 characters so they are readable in the shell. I also find list comprehensions hard to read in general, and for a complicated one I will often just make a for loop so it is clearer.

For translate, you first have to make a translation table with the maketrans method. It takes three arguments: a string of characters of characters in the original string, a string of characters to replace them with (in the same order), and a string of characters to delete. You can also use one argument that is a dictionary, see the documentation for details. So for your example:

# define the bad special characters
invalid_char = '!@#$%^&*()-_=+<>?,./\[]{};:'
trans = ''.maketrans('', '', invalid_char)
 
# create a list based on content in the spreadsheet (starting at a specific row in the document)
my_list = [(data.cell_value(row+10,1)) for row in range(data.nrows-10)]
 
# create a new list by cleaning up the content of the previously made list
new_list = [i.translate(trans) for i in my_list]
The above works in Python 3.x. For Python 2.x, translate was done through the string module, see the documentation for that.

Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#6
(Oct-09-2016, 12:55 PM)ichabod801 Wrote: Now I mentioned this in terms of efficiency, although it also makes the code easier to read. The idea is that the translate method is part of the base language, and is written in C. Therefore it's probably going to be faster than a join method that is doing a lot of processing in Python. The same is probably true of snippsat's regex method.
Yes translate is the fastest as expected,it win bye ca 10 sec(to other solution's) running 1000000 times with timeit.
list used to test:
lst = ['hello?', 'wo+rld@', 'toge?the'] * 5
But all optimization can be trowed out the window if using PyPy.
Regex version run 3 times faster in PyPy,than translate version run trough Python 3.4.
Did't rewrite translate for Python 2 to test with PyPy,
but when doing this before PyPy smooth out time difference and it become small.
Reply
#7
(Oct-09-2016, 08:25 AM)snippsat Wrote:
(Oct-09-2016, 01:10 AM)ATXpython Wrote: Is it possible to do the above in one list comprehension?
Seems kind of wasteful to create a list based on another list, if all I'm trying to do is remove certain characters from it.
No to much in one list comprehension make it long and harder to read.
So it's not ideal at all.
I think it's okay as you have it now.

There are some different way like translate as mention and i can show one with regex.
>>> import re
>>> lst = ['hello?', 'wo+rld@', 'toge?the]']
>>> [re.sub(r'[?@\]+.,]', '', item) for item in lst]
['hello', 'world', 'together']
Quote:While, I know PEP-8 isn't law, is there any downside to having a line so long in Python?
It's to long when you get over 100 if you ask me,around 90'ish is okay.

it depends on typical terminal size.  even with big screen terminal programs usually default to to 80 for the width.  many do not "fix" wrapping issues so limiting at 79 on these is better.  people often leave term programs at the defaults.

i have changed mine to nearly full screen (166 wide, 46 lines, 14 pt font, on 1920x1080).  but this also limits me from doing 2 term windows side by side.   if i shink 2 term windows they end up nearly at 80 wide.  so, despite some of my code being very wide (i have some that exceeds 1000's) i suggest making code fit in 79 .... not even 90-ish ... just 79.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#8
Readability shouldn't be a problem even with a long list comprehension. It can be wrapped. I doing it often when the code is too long or for readability.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  List Comprehension Issue johnywhy 5 540 Jan-14-2024, 07:58 AM
Last Post: Pedroski55
Question mypy unable to analyse types of tuple elements in a list comprehension tomciodev 1 480 Oct-17-2023, 09:46 AM
Last Post: tomciodev
  Using list comprehension with 'yield' in function tester_V 5 1,253 Apr-02-2023, 06:31 PM
Last Post: tester_V
  python multiple try except block in my code -- can we shorten code mg24 10 6,143 Nov-10-2022, 12:48 PM
Last Post: DeaD_EyE
  list comprehension 3lnyn0 4 1,415 Jul-12-2022, 09:49 AM
Last Post: DeaD_EyE
  Want to shorten the python code shantanu97 3 1,271 Apr-25-2022, 01:12 PM
Last Post: snippsat
  List comprehension used differently coder_sw99 3 1,720 Oct-03-2021, 04:12 PM
Last Post: coder_sw99
  How to invoke a function with return statement in list comprehension? maiya 4 2,843 Jul-17-2021, 04:30 PM
Last Post: maiya
  List comprehension and Lambda cametan 2 2,241 Jun-08-2021, 08:29 AM
Last Post: cametan
  What is the difference between a generator and a list comprehension? Pedroski55 2 2,221 Jan-02-2021, 04:24 AM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020