Here, a "polygon" refers to a triangle. All . However, as you point out, there are many more variables than just the number of triangles which determine performance.
Key issues that matter are:
- The format of storage (indexed or not; list, fan, or strip)
- The location of storage (host-memory vertex arrays, host-memory vertex buffers, or GPU-memory vertex buffers)
- The mode of rendering (is the draw primitive command issued fully from the host, or via instancing)
- Triangle size
Together, those variables can create much greater than a 2x variation in performance.
Similarly, the hardware on which the application is running may vary 10x or more in performance in the real world: a GPU (or integrated graphics processor) that was low-end in 2005 will perform 10-100x slower in any meaningful metric than a current top-of-the-line GPU.
All told, any recommendation that you use 2-4000 triangles is so ridiculously outdated that it should be entirely ignored today. Even low-end hardware today can easily push 100,000 triangles in a frame under reasonable conditions. Further, most visually interesting applications today are dominated by pixel shading performance, not triangle count.
General rules of thumb for achieving good triangle throughput today:
- Use [indexed] triangle (or quad) lists
- Store data in GPU-memory vertex buffers
- Draw large batches with each draw primitives call (thousands of primitives)
- Use triangles mostly >= 16 pixels on screen
- Don't use the Geometry Shader (especially for geometry amplification)
Do all of those things, and any machine today should be able to render tens or hundreds of thousands of triangles with ease.