Best practices for geospatial analytics

Vertica recommends the following best practices when performing geospatial analytics in Vertica.

Vertica recommends the following best practices when performing geospatial analytics in Vertica.

Performance optimization

Recommendation Details
Use the minimum column size for spatial data. Performance degrades as column widths increase. When creating columns for your spatial data, use the smallest size column that can accommodate your data. For example, use GEOMETRY(85) for point data.
Use GEOMETRY types where possible. Performance of functions on GEOGRAPHY types is slower than functions that support GEOMETRY types. Use GEOMETRY types where possible.

To improve the performance of the following functions, sort projections on spatial columns:

  • STV_Intersect scalar function

  • ST_Distance

  • ST_Area

  • ST_Length

You may improve the pruning efficiency of these functions by sorting the projection on the GEOMETRY column. However, sorting on a large GEOMETRY column may slow down data load.

Spatial joins with points and polygons

Vertica provides two ways to identify whether a set of points intersect with a set of polygons. Depending on the size of your data set, choose the approach that gives the best performance.

For a detailed example of best practices with spatial joins, see Best practices for spatial joins.

Recommendation Details
Use ST_Intersects to intersect a constant geometry with a set of geometries.

When you intersect a set of geometries with a geometry, use the ST_Intersects function. Express the constant geometry argument as a WKT or WKB. For example:

ST_Intersects(geom,ST_GeomFromText( 'POLYGON((43.1 50.1,43.1 59.0, 48.9 59.0,43.1 50.1))'))

For more information, see Performing spatial joins with ST_Intersects.

Create a spatial index only when performing spatial joins with STV_Intersect. Spatial indexes should only be used with STV_Intersect. Creating a spatial index and then performing spatial joins with ST_Intersects will not improve performance.
Use the STV_Intersect function when you intersect a set of points with a set of polygons.

Determine if a set of points intersects with a set of polygons in a medium to large data set. First, create a spatial index using STV_Create_Index. Then, use one of the STV_Intersect functions to return the set of pairs that intersect.

Spatial indexes provide the best performance for accessing a large number of polygons.

When using the STV_Intersect transform function, partition the data and use an OVER(PARTITION BEST) clause. The STV_Intersect transform function does not require that you partition the data. However, you may improve performance by partitioning the data and using an OVER(PARTITION BEST) clause.

Spatial indexes

The STV_Create_Index function can consume large amounts of processing time and memory. When you index new data for the first time, monitor memory usage to be sure it stays within safe limits. Memory usage depends on:

  • Number of polygons

  • Number of vertices

  • Amount of overlap among polygons

Recommendation Details
Segment polygon data when a table contains a large number of polygons. Segmenting the data allows the index creation process to run in parallel. This is advantageous because sometimes STV_Create_Index tasks cannot be completed when large tables that are not segmented prior to index creation.
Adjust STV_Create_Index parameters as needed for memory allocation and CPU usage.

The max_mem_mb parameter can affect the resource usage of STV_Create_Index. max_mem_mb assigns a limit to the amount of memory that STV_Create_Index can allocate.

Default: 256

Valid values: Any value less than or equal to the amount of memory in the GENERAL resource pool. Assigning a higher value results in an error.

Make changes if STV_Create_Index cannot allocate 300 MB memory.

Before STV_Create_Index starts creating the index, it tries to allocate about 300 MB of memory. If that much memory is not available, the function fails. If you get a failure message, try these solutions:

  • Create the index at a time of less load on the system.

  • Avoid concurrent index creation.

  • Add more memory to your system.

Create misplaced indexes again, if needed. When you back up your Vertica database, spatial index files are not included. If you misplace an index, use STV_Create_Index to re-create it.
Use STV_Refresh_Index to add new or updated polygons to an existing index. Instead of rebuilding your spatial index each time you add new or updated polygons to a table, you can use STV_Refresh_Index to append the polygons to your existing spatial index.

Checking polygon validity

Recommendation Details
Run ST_IsValid to check if polygons are valid.

Many spatial functions do not check the validity of polygons.

  • Run ST_IsValid on all polygons to determine if they are valid.

  • If your object is not valid, run STV_IsValidReason to get information about the location of the invalid polygon.

For more information, see Ensuring polygon validity before creating or refreshing an index.