Python Forum

Full Version: can itertools compact a list removing all of some value?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i have a huge list (approaching a billion items) which is mostly strings and/or numbers, but there are a lot of None values in there, too. i would like to compact the list in place by removing all the None values. can a method in some module do this or do i need to figure this out? the import part is to not duplicate the list since that would cause more swapping delays. but something that copies the remainder of the list each time it finds a None is also a bad idea since it bumps it up to O(n**2). something that morphs the list into a tuple is OK, since it is not modified after this.

i have looked over itertools but nothing seems obvious.
>>> startlist = [4,None,'harry',67,None,67,'sss']
>>> startlist = [x for x in startlist if x is not None]
>>> startlist
[4, 'harry', 67, 67, 'sss']
>>>
that makes a new list while the old one exists, before it is garbage collected. it would swap like crazy with a huge list as the new one will be in a new memory location.
It's usually dangerous to mutate the list while iterating over it. But if you know what are you doing then...

Simple code to remove elements in place (starting from end in order not to mess indices):

>>> test = [None, 'b', None, 'a', None]
>>> for i in range(len(test) - 1, -1, -1):
...     if test[i] == None:
...         del test[i]
...
>>> test
['b', 'a']
my first thought was:
n = 0
for i in range(len(test)):
    if test[i] is not None:
        test[n] = test[i]
        n += 1
test[n:] = []
but i don't know how safe or fast that is.
if you remove data from the list starting from the end and moving towards the beginning, then there will be no index discrepancies during the purge.
but, would my way work? that's how i would it it in C (then store the new length where the length is kept). but i can understand the way working from the end. it would just seem to be a slower way because the list is shortened so many times.