ansaurus

Question

Selecting a good SQL Server 2008 spatial index with large polygons

Answer 1

+1 A:

Splitting Data

If the query is for displaying data then you could split up your large polygons using a grid. These would be then very quick to retrieve with an index. You could remove the outlines so the features would still look contiguous.

Most commercial GIS packages will have tools to split one polygon dataset by another. Search for tools that do intersections.

If you are using OpenSource then have a look at QGIS and http://www.ftools.ca which "perform geoprocessing operations including intersections, differencing, unions, dissolves, and clipping." I've not used the latter myself.

Have a look at: http://postgis.refractions.net/docs/ch04.html#id2790790 for why large features are bad.

Filter and Intersects

There is more on the Filter clause here - http://blogs.msdn.com/b/isaac/archive/2010/03/04/filter-one-odd-duck.aspx

Spatial Indexes

Something else to check is that the spatial index is actually being used in the query plan. You may have to force the query to use the index with the WITH clause:

http://blogs.msdn.com/b/isaac/archive/2008/08/29/is-my-spatial-index-being-used.aspx

More details on indexes below:

http://blogs.msdn.com/b/isaac/archive/2009/05/28/sql-server-spatial-indexing.aspx

Also try running sp_help_spatial_geometry_index for your data to see what settings to use for your spatial index

http://msdn.microsoft.com/en-us/library/cc627426.aspx

Running this SP with some test geometry produces all sorts of statistics to try and tailor your index to your data. A full list of properties is at http://msdn.microsoft.com/en-us/library/cc627425.aspx

These include values such as:

CellArea_To_BoundingBoxArea_Percentage_In_Level1
Number_Of_Rows_Selected_By_Primary_Filter

Messed Up Geometry

From the results of sp_help_spatial_geometry_index it looks like you may have issues with the geometry itself rather than the spatial index.

The Base_Table_Rows count looks to be a bug - http://connect.microsoft.com/SQLServer/feedback/details/475838/number-of-rows-in-base-table-incorrect-in-sp-help-spatial-geography-index-xml It may be worth recreating table / database and trying the index from scratch.

Total_Number_Of_ObjectCells_In_Level0_In_Index 60956 is a lot of features to return at level 0. It is likely they are either outside the spatial index extent or nulls. It then runs the Intersect (Number_Of_Times_Secondary_Filter_Is_Called 60956) on all these features which would explain why it is slow. Even though the docs claim no performance hit for null features - I believe it still has to look up the records, even if no intersect is performed.

NULL and empty instances are counted at level 0 but will not impact performance. Level 0 will have as many cells as NULL and empty instances at the base table.

The Primary_Filter_Efficiency of 0.003281055 I believe indicates 0.03% efficiency!

A few things to try:

Anything strange from SELECT * FROM sys.spatial_indexes?
The MakeValid statement:

UPDATE MyTable SET GeomFieldName = GeomFieldName.MakeValid()
Reset / double check SRID:

UPDATE MyTable SET GeomFieldName.STSrid = 4326
Add in some fields to show the extents of your features. This may highlight issues / NULL geometries.

ALTER TABLE MyTable ADD MinX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STX,0)) PERSISTED ALTER TABLE MyTable ADD MinY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STY,0)) PERSISTED ALTER TABLE MyTable ADD MaxX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STX,0)) PERSISTED ALTER TABLE MyTable ADD MaxY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STY,0)) PERSISTED

geographika 2010-05-30 19:17:36

The query is being generated by GeoServer that I am using to generate tiles that are rendered using Bing maps.Splitting up the polygons was one thing I had already considered doing, but had put off so far as I know it will take me a day or two to perfect the query/build a tool to do it.The query time for Filter is 50% lower than the query time for STIntersects.

andynormancx 2010-05-30 22:00:24

I am still puzzled as to why the Filter returns quite so many records. There are lots of large polygons in the data set, but the description of how the SQL indexes work would lead me to expect a few thousand returned by the filter, not 60,000+ (60% of the whole data set).

andynormancx 2010-05-30 22:06:00

I just did a quick query to see exactly how many truly huge polygons I have. Looking at the area of the envelope of each polygon, only 3 of the 106,000 polygons occupy more than 50% of the globe and only 23 of them are bigger than one eight of the globe.

andynormancx 2010-05-30 22:13:49

I realised I misunderstood your question about the query time. Combining a Filter with a STIntersects takes the same time as an STIntersects by itself, whether you combine them with a sub query or adding both to the WHERE clause. This isn't a surprise as what I have read says that STIntersects does a Filter internally anyway when there is a spatial index available. Just doing a Filter take 6 seconds, anything involving adding STIntersects takes 12 seconds.

andynormancx 2010-05-30 22:23:31

I've added links to how the Filter works, and splitting the polygons - there should be no need to develop your own tools to do this.

geographika 2010-05-31 07:38:22

Thanks for the links, I didn't know QGIS could do thinks like that (I've used it a bit).

andynormancx 2010-05-31 07:51:57

However I still don't understand why the index doesn't cut down the results from Filter more. If I select "Low" for the number of cells in the top level grid there will be 16 cells in the grid. My inspection of the data suggests only 23 of my polygons should appear in more than two of those cells. My target of the query will only appear in one of the cells. So why does the Filter return 60,000 rows ?

andynormancx 2010-05-31 08:02:30

The Filter may be using the bounding box of the polygon to check if it is in an index cell. If you drew a box round the extent of every feature would this account for the difference?

geographika 2010-05-31 08:49:16

No, I don't think so. The way I calculated how many huge polygons I have is by doing .STEnvelope().STArea() on each one. That was how I got the result that only 23 cover more than 1/8th of the globe.

andynormancx 2010-05-31 09:58:59

I have done the splitting, ended up doing it myself, couldn't see how QGIS could do it (plus it is hugely slow on this data set). It didn't give me the results I expected, see the extra detail I added to the question.

andynormancx 2010-05-31 10:07:34

I have tried a huge range of settings for the index settings. Yes I have tried MEDIUM and 16 cells per object, as that is the default that SQL Management Studio uses.

andynormancx 2010-05-31 13:36:49

I have already tried running sp_help_spatial_geometry_index before and then tried tweaking the index settings. But whatever I did, even going from one extreme to the other still resulted in about 60,000 rows being returned by .Filter()

andynormancx 2010-05-31 13:42:54

Nothing strange looking in sys.spatial_indexes. MakeValid has already been used on this data. I checked the Srids, they were all correct, I reset them for good measure. I used your extra fields to check the geometries, none of them are null and they all fit within the bounds of my index.

andynormancx 2010-06-02 09:06:24

Thanks for your efforts, I can only assume SQL Server really doesn't like my data or I have hit some sort of bug.

andynormancx 2010-06-02 09:07:06

One final thing - have you tried with the GEOGRAPHY type? As you are using 4326 and world-wide data it may be more suitable. Just make sure no features cross the equator (you may need to splt again)

geographika 2010-06-02 09:26:56

I just tried to give geography a go. Unfortunately one or more (probably lots) of my geometry rows aren't valid for geography. I haven't looked to see which of the various rules for geography that they break, that would be a big effort. Also, I don't think the GeoServer SQL Server plugin supports geography data types yet. None of my features cross the equator since I split them using the 4x4 grid.

andynormancx 2010-06-02 12:26:28

Answer 2

+2 A:

In your index query you use:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-90, -180, 90, 180),
...

The BOUNDING_BOX therefore maps to:

xmin = -90
ymin = -180
xmax = 90
ymax = 180

Longtitude (-180 to 180 - designating East / West of the Meridian) should map to X
Latitude (-90 to 90 - designating how far North or South of the Equator) should map to Y

So to create the BOUNDING_BOX for the world you should use:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-180, -90, 180, 90),
...

This should create an index that fits your data and means all your features are covered by the index.

geographika 2010-06-07 10:21:15

Thank you, thank you, thank you. It would appear that the tool I used to import the shape files created the indexes with the wrong bounding box initially and I have been slavishly copying the same error ever since.

andynormancx 2010-06-07 11:38:23

ansaurus

tags:

views:

answers:

Selecting a good SQL Server 2008 spatial index with large polygons

Splitting Data

Filter and Intersects

Spatial Indexes

Messed Up Geometry

related questions