Best practices for geospatial analytics
Vertica recommends the following best practices when performing geospatial analytics in Vertica.
Performance optimization
Recommendation | Details |
---|---|
Use the minimum column size for spatial data. | Performance degrades as column widths increase. When creating columns for your spatial data, use the smallest size column that can accommodate your data. For example, use GEOMETRY(85) for point data. |
Use GEOMETRY types where possible. | Performance of functions on GEOGRAPHY types is slower than functions that support GEOMETRY types. Use GEOMETRY types where possible. |
To improve the performance of the following functions, sort projections on spatial columns:
|
You may improve the pruning efficiency of these functions by sorting the projection on the GEOMETRY column. However, sorting on a large GEOMETRY column may slow down data load. |
Spatial joins with points and polygons
Vertica provides two ways to identify whether a set of points intersect with a set of polygons. Depending on the size of your data set, choose the approach that gives the best performance.
For a detailed example of best practices with spatial joins, see Best practices for spatial joins.
Recommendation | Details |
---|---|
Create a spatial index only when performing spatial joins with STV_Intersect. | Spatial indexes should only be used with STV_Intersect. Creating a spatial index and then performing spatial joins with ST_Intersects will not improve performance. |
Use the STV_Intersect function when you intersect a set of points with a set of polygons. |
Determine if a set of points intersects with a set of polygons in a medium to large data set. First, create a spatial index using STV_Create_Index. Then, use one of the STV_Intersect functions to return the set of pairs that intersect. Spatial indexes provide the best performance for accessing a large number of polygons. |
When using the STV_Intersect transform function, partition the data and use an OVER(PARTITION BEST) clause. | The STV_Intersect transform function does not require that you partition the data. However, you may improve performance by partitioning the data and using an OVER(PARTITION BEST) clause. |
Spatial indexes
The STV_Create_Index function can consume large amounts of processing time and memory. When you index new data for the first time, monitor memory usage to be sure it stays within safe limits. Memory usage depends on:
-
Number of polygons
-
Number of vertices
-
Amount of overlap among polygons
Recommendation | Details |
---|---|
Segment polygon data when a table contains a large number of polygons. | Segmenting the data allows the index creation process to run in parallel. This is advantageous because sometimes STV_Create_Index tasks cannot be completed when large tables that are not segmented prior to index creation. |
Adjust STV_Create_Index parameters as needed for memory allocation and CPU usage. |
The Default: 256 Valid values: Any value less than or equal to the amount of memory in the GENERAL resource pool. Assigning a higher value results in an error. |
Make changes if STV_Create_Index cannot allocate 300 MB memory. |
Before STV_Create_Index starts creating the index, it tries to allocate about 300 MB of memory. If that much memory is not available, the function fails. If you get a failure message, try these solutions:
|
Create misplaced indexes again, if needed. | When you back up your Vertica database, spatial index files are not included. If you misplace an index, use STV_Create_Index to re-create it. |
Use STV_Refresh_Index to add new or updated polygons to an existing index. | Instead of rebuilding your spatial index each time you add new or updated polygons to a table, you can use STV_Refresh_Index to append the polygons to your existing spatial index. |
Checking polygon validity
Recommendation | Details |
---|---|
Run ST_IsValid to check if polygons are valid. |
Many spatial functions do not check the validity of polygons.
For more information, see Ensuring polygon validity before creating or refreshing an index. |