I have the following code in Python:
def point_to_index(point):
if point not in points:
points.append(point)
return points.index(point)
This code is awfully inefficient, especially since I expect points
to grow to hold a few million elements.
If the point isn't in the list, I traverse the list 3 times:
- look for it and decide it isn't there
- go to the end of the list and add a new element
- go to the end of the list until I find the index
If it is in the list, I traverse it twice: 1. look for it and decide it is there 2. go almost to the end of the list until I find the index
Is there any more efficient way to do this? For instance, I know that:
- I'm more likely to call this function with a point that isn't in the list.
- If the point is in the list, it's likelier to be near the end than in the beginning.
So if I could have the line:
if point not in points:
search the list from the end to the beginning it would improve performance when the point is already in the list.
However, I don't want to do:
if point not in reversed(points):
because I imagine that reversed(points)
itself will come at a huge cost.
Nor do I want to add new points to the beginning of the list (assuming I knew how to do that in Python) because that would change the indices, which must remain constant for the algorithm to work.
The only improvement I can think of is to implement the function with only one pass, if possible from the end to the beginning. The bottom line is:
- Is there a good way to do this?
- Is there a better way to optimize the function?
Edit: I've gotten suggestions for implementing this with only one pass. Is there any way for index()
to go from the end to the beginning?
Edit: People have asked why the index is critical. I'm trying to describe a 3D surface using the OFF file format. This format describes a surface using its vertices and faces. First the vertices are listed, and the faces are described using a list of indices of vertices. That's why once I add a vortex to the list, its index must not change.
Edit: There have been some suggestions (such as igor's) to use a dict. This is a good solution for scanning the list. However, when I'm done I need to print out the list in the same order it was created. If I use a dict, I need to print out its keys sorted by value. Is there a good way to do that?
Edit: I implemented www.brool.com's suggestion. This was the simplest and fastest. It is essentially an ordered Dict, but without the overhead. The performance is great!