I was debugging a tag module at work today when I discovered something not overly intuitive about ists in python.
What I was seeing appeared to be that when iterating over a list to conditionally remove its members, every other element was being removed. After debugging the code, I isolated it down to this example.
[source lang='python']
fruit = ['apple', 'orange', 'pear']
for f in fruit:
fruit.remove(f)
>>> fruit
['orange']
[/source]
It isn't as clear with only 3 elements, but it is removing every other item in the list here. My immediate thought was that it was a pointer issue, so I tried cloning the list.
[source lang='python']
fruit2 = ['apple', 'orange', 'pear']
for f in list(fruit2): # New cloned list is implicitly created
fruit2.remove(f)
>>> fruit2
[] #Success!
[/source]
The solution
The quick solution is to wrap the original list in a list() call to clone the list which will create a whole new spot in memory. Then you can safely iterate over that and modify your existing list. Note: There is probably a better solution using list compression.
Oh the joys of debugging others' code. It works now. Problem solved... but I like to know why things happen.
The moral
So, with a little googling and asking around at work, I learn that removing items on the list you are iterating over a bad practice. It has to do with the underlying datastructure and pointers as I suspected. As one of my coworkers joke of my example "You forgot starfruit" ... a joke about * being a pointer in many langs.
Two Good Resources
Laurent Luce wrote a nice recap of
Python's list implementation including description of underlying data structure and Big O notation of complexities. It is a good read.
Also, there is a
nice stack overflow article answering the loop/remove question with the suggested answer of using list compression - although it similarly clones the list.
Anywho, hopefully the google juice will save someone some head scratching.