1 - Defining table columns for spatial data

To define columns to contain GEOMETRY and GEOGRAPHY data, use this command:.

To define columns to contain GEOMETRY and GEOGRAPHY data, use this command:

=> CREATE TABLE [[db-name.]schema.]table-name (
   column-name GEOMETRY[(length)],
   column-name GEOGRAPHY[(length)]);

If you omit the length specification, the default column size is 1 MB. The maximum column size is 10 MB. The upper limit is not enforced, but the geospatial functions can only accept or return spatial data up to 10 MB.

You cannot modify the size or data type of a GEOMETRY or GEOGRAPHY column after creation. If the column size you created is not sufficient, create a new column with the desired size. Then copy the data from the old column, and drop the old column from the table.

You cannot import data to or export data from tables that contain spatial data from another Vertica database.

2 - Exporting spatial data from a table

You can export spatial data from a table in your Vertica database to a shapefile.

You can export spatial data from a table in your Vertica database to a shapefile.

To export spatial data from a table to a shapefile:

  1. As the superuser., set the shapefile export directory.

    => SELECT STV_SetExportShapefileDirectory(USING PARAMETERS path = '/home/geo/temp');
                   STV_SetExportShapefileDirectory
    ------------------------------------------------------------
     SUCCESS. Set shapefile export directory: [/home/geo/temp]
    (1 row)
    
  2. Export your spatial data to a shapefile.

    => SELECT STV_Export2Shapefile(*
                  USING PARAMETERS shapefile = 'visualizations/city-data.shp',
                                shape = 'Polygon') OVER() FROM spatial_data;
     Rows Exported |                          File Path
    ---------------+----------------------------------------------------------------
           185873 | v_geo-db_node0001: /home/geo/temp/visualizations/city-data.shp
    (1 row)
    
    • The value asterisk (*) is the equivalent to listing all columns in the FROM clause.

    • You can specify sub-directories when exporting your shapefile.

    • Your shapefile must end with the file extension .shp.

  3. Verify that three files now appear in the shapefile export directory.

    $ ls
    city-data.dbf  city-data.shp   city-data.shx
    

3 - Identifying null spatial objects

You can identify null GEOMETRY and GEOGRAPHY objects using the Vertica IS NULL and IS NOT NULL constructs.

You can identify null GEOMETRY and GEOGRAPHY objects using the Vertica IS NULL and IS NOT NULL constructs.

This example uses the following table, where the row with id=2 has a null value in the geog field.


=> SELECT id, ST_AsText(geom), ST_AsText(geog) FROM locations
   ORDER BY 1 ASC;
 id |           ST_AsText              |                 ST_AsText
----+----------------------------------+--------------------------------------
  1 | POINT (2 3)                      | POINT (-85 15)
  2 | POINT (4 5)                      |
  3 | POLYGON ((-1 2, 0 3, 1 2, -1 2)) | POLYGON ((-24 12, -15 23, -20 27, -24 12))
  4 | LINESTRING (-1 2, 1 5)           | LINESTRING (-42.74 23.98, -62.19 23.78)
(4 rows)

Identify all the rows that have a null geog value:

=> SELECT id, ST_AsText(geom), (ST_AsText(geog) IS NULL) FROM locations
   ORDER BY 1 ASC;
 id |            ST_AsText             | ?column?
----+----------------------------------+----------
  1 | POINT (2 3)                      | f
  2 | POINT (4 5)                      | t
  3 | POLYGON ((-1 2, 0 3, 1 2, -1 2)) | f
  4 | LINESTRING (-1 2, 1 5)           | f
(4 rows)

Identify the rows where the geog value is not null:

=> SELECT id, ST_AsText(geom), (ST_AsText(geog) IS NOT NULL) FROM locations
   ORDER BY 1 ASC;
 id |            st_astext             | ?column?
----+----------------------------------+----------
  1 | POINT (2 3)                      | t
  2 | POINT (4 5)                      | f
  3 | LINESTRING (-1 2, 1 5)           | t
  4 | POLYGON ((-1 2, 0 3, 1 2, -1 2)) | t
(4 rows)

4 - Loading spatial data from shapefiles

Vertica provides the capability to load and parse spatial data that is stored in shapefiles.

Vertica provides the capability to load and parse spatial data that is stored in shapefiles. Shapefiles describe points, lines, and polygons. A shapefile is made up of three required files; all three files must be present and in the same directory to define the geometries:

  • .shp—Contains the geometry data.

  • .shx—Contains the positional index of the geometry.

  • .dbf—Contains the attributes for each geometry.

To load spatial data from a shapefile:

  1. Use STV_ShpCreateTable to generate a CREATE TABLE statement.

    => SELECT STV_ShpCreateTable ( USING PARAMETERS file = '/home/geo/temp/shp-files/spatial_data.shp')
                                   OVER() AS spatial_data;
               spatial_data
    ----------------------------------
     CREATE TABLE spatial_data(
       gid IDENTITY(64) PRIMARY KEY,
       uniq_id INT8,
       geom GEOMETRY(85)
    );
    (5 rows)
    
  2. Create the table.

    => CREATE TABLE spatial_data(
       gid IDENTITY(64) PRIMARY KEY,
       uniq_id INT8,
       geom GEOMETRY(85));
    
  3. Load the shapefile.

    => COPY spatial_data WITH SOURCE STV_ShpSource(file='/home/geo/temp/shp-files/spatial_data.shp')
        PARSER STV_ShpParser();
     Rows Loaded
    -------------
              10
    (1 row)
    

Supported shapefile shape types

The following table lists the shapefile shape types that Vertica supports.

Shapefile Shape Type Supported
Null shape Yes
Point Yes
Polyline Yes
Polygon Yes
MultiPoint Yes
PointZ No
PolylineZ No
PolygonZ No
MultiPointZ No
PointM No
PolylineM No
PolygonM No
MultiPointM No
MultiPatch No

5 - Loading spatial data into tables using COPY

You can load spatial data into a table in Vertica using a COPY statement.

You can load spatial data into a table in Vertica using a COPY statement.

To load data into Vertica using a COPY statement:

  1. Create a table.

    => CREATE TABLE spatial_data (id INTEGER, geom GEOMETRY(200));
    CREATE TABLE
    
  2. Create a text file named spatial.dat with the following data.

    1|POINT(2 3)
    2|LINESTRING(-1 2, 1 5)
    3|POLYGON((-1 2, 0 3, 1 2, -1 2))
    
  3. Use COPY to load the data into the table.

    => COPY spatial_data (id, gx FILLER LONG VARCHAR(605), geom AS ST_GeomFromText(gx)) FROM LOCAL 'spatial.dat';
     Rows Loaded
    -------------
               3
    (1 row)
    

    The statement specifies a LONG VARCHAR(32000000) filler, which is the maximum size of WKT. You must specify a filler value large enough to hold the largest WKT you want to insert into the table.

6 - Retrieving spatial data from a table as well-known text (WKT)

GEOMETRY and GEOGRAPHY data is stored in Vertica tables as LONG VARBINARY, which isn't human readable.

GEOMETRY and GEOGRAPHY data is stored in Vertica tables as LONG VARBINARY, which isn't human readable. You can use ST_AsText to return the spatial data as Well-Known Text (WKT).

To return spatial data as WKT:

=> SELECT id, ST_AsText(geom) AS WKT FROM spatial_data;
 id |               WKT
----+----------------------------------
  1 | POINT (2 3)
  2 | LINESTRING (-1 2, 1 5)
  3 | POLYGON ((-1 2, 0 3, 1 2, -1 2))
(3 rows)

7 - Working with GeoHash data

Vertica supports GeoHashes.

Vertica supports GeoHashes. A GeoHash is a geocoding system for hierarchically encoding increasingly granular spatial references. Each additional character in a GeoHash drills down to a smaller section of a map.

You can use Vertica to generate spatial data from GeoHashes and GeoHashes from spatial data. Vertica supports the following functions for use with GeoHashes:

For example, to generate a full precision and partial precision GeoHash from a single point.

=> SELECT ST_GeoHash(ST_GeographyFromText('POINT(3.14 -1.34)')), LENGTH(ST_GeoHash(ST_GeographyFromText('POINT(3.14 -1.34)'))),
                     ST_GeoHash(ST_GeographyFromText('POINT(3.14 -1.34)') USING PARAMETERS numchars=5) partial_hash;
      ST_GeoHash      | LENGTH | partial_hash
----------------------+--------+--------------
 kpf0rkn3zmcswks75010 |     20 | kpf0r
(1 row)

This example shows how to generate a GeoHash from a multipoint point object. The returned polygon is a geometry object of the smallest tile that encloses that GeoHash.

=> SELECT ST_AsText(ST_GeomFromGeoHash(ST_GeoHash(ST_GeomFromText('MULTIPOINT(0 0, 0.0002 0.0001)')))) AS region_1,
                    ST_AsText(ST_GeomFromGeoHash(ST_GeoHash(ST_GeomFromText('MULTIPOINT(0.0001 0.0001, 0.0003 0.0002)')))) AS region_2;
 -[ RECORD 1 ]---------------------------------------------------------------------------------------------
    region_1 | POLYGON ((0 0, 0.00137329101562 0, 0.00137329101562 0.00137329101562, 0 0.00137329101562, 0 0))
    region_2 | POLYGON ((0 0, 0.010986328125 0, 0.010986328125 0.0054931640625, 0 0.0054931640625, 0 0))

8 - Spatial joins with ST_Intersects and STV_Intersect

Spatial joins allow you to identify spatial relationships between two sets of spatial data.

Spatial joins allow you to identify spatial relationships between two sets of spatial data. For example, you can use spatial joins to:

  • Calculate the density of mobile calls in various regions to determine the location of a new cell phone tower.

  • Identify homes that fall within the impact zone of a hurricane.

  • Calculate the number of users who live within a certain ZIP code.

  • Calculate the number of customers in a retail store at any given time.

8.1 - Best practices for spatial joins

Use these best practices to improve overall performance and optimize your spatial queries.

Use these best practices to improve overall performance and optimize your spatial queries.

Best practices for using spatial joins in Vertica include:

  • Table segmentation to speed up index creation

  • Adequately sizing a geometry column to store point data

  • Loading Well-Known Text (WKT) directly into a Geometry column, using STV_GeometryPoint in a COPY statement

  • Using OVER (PARTITION BEST) with STV_Intersect transform queries

Best practices example

Before performing the steps in the following example, download place_output.csv.zip from the Vertica Place GitHub repository (https://github.com/vertica/Vertica-Geospatial). You need to use the data set from this repository.

  1. Create the table for the polygons. Use a GEOMETRY column width that fits your data without being excessively large. A good column-width fit improves performance. In addition, segmenting the table by HASH provides the advantages of parallel computation.

    => CREATE TABLE artworks (gid int, g GEOMETRY(700)) SEGMENTED BY HASH(gid) ALL NODES;
    
  2. Use a copy statement with ST_Buffer to create and load the polygons on which to run the intersect. By using ST_Buffer in your copy statement, you can use that function to create the polygons.

    => COPY artworks(gid, gx FILLER LONG VARCHAR, g AS ST_Buffer(ST_GeomFromText(gx),8)) FROM STDIN DELIMITER ',';
    >> 1, POINT(10 45)
    >> 2, POINT(25 45)
    >> 3, POINT(35 45)
    >> 4, POINT(35 15)
    >> 5, POINT(30 5)
    >> 6, POINT(15 5)
    >> \.
    
  3. Create a table for the location data, represented by points. You can store point data in a GEOMETRY column of 100 bytes. Avoid over-fitting your GEOMETRY column. Doing so can significantly degrade spatial intersection performance. Also, segment this table by HASH, to take advantage of parallel computation.

    => CREATE TABLE usr_data (gid identity, usr_id int, date_time timestamp, g GEOMETRY(100))
         SEGMENTED BY HASH(gid) ALL NODES;
    
  4. During the copy statement, transform the raw location data to GEOMETRY data. You must perform this transformation because your location data needs to use the GEOMETRY data type. Use the function STV_GeometryPoint to transform the x and y columns of the source table.

    => COPY usr_data (usr_id, date_time, x FILLER LONG VARCHAR,
                      y FILLER LONG VARCHAR, g AS STV_GeometryPoint(x, y))
       FROM LOCAL 'place_output.csv' DELIMITER ',' ENCLOSED BY '';
    
  5. Create the spatial index for the polygons. This index helps you speed up intersection calculations.

    => SELECT STV_Create_Index(gid, g USING PARAMETERS index='art_index', overwrite=true) OVER() FROM artworks;
    
  6. Write an analytic query that returns the number of intersections per polygon. Specify that Vertica ignore any usr_id that intersects less than 20 times with a given polygon.

    => SELECT pol_gid,
           COUNT(DISTINCT(usr_id)) AS count_user_visit
       FROM
         (SELECT pol_gid,
           usr_id,
           COUNT(usr_id) AS user_points_in
        FROM
           (SELECT STV_Intersect(usr_id, g USING PARAMETERS INDEX='art_index') OVER(PARTITION BEST) AS (usr_id,
                                                            pol_gid)
        FROM usr_data
          WHERE date_time BETWEEN '2014-07-02 09:30:20' AND '2014-07-02 17:05:00') AS c
        GROUP BY pol_gid,
        usr_id HAVING COUNT(usr_id) > 20) AS real_visits
        GROUP BY pol_gid
        ORDER BY count_user_visit DESC;
    

Optimizations in the example query

This query has the following optimizations:

  • The time predicated appears in the subquery.

  • Using the location data table avoids the need for an expensive join.

  • The query uses OVER (PARTITION BEST), to improve performance by partitioning the data.

  • The user_points_in provides an estimate of the combined time spent intersecting with the artwork by all visitors.

8.2 - Ensuring polygon validity before creating or refreshing an index

When Vertica creates or updates a spatial index it does not check polygon validity.

When Vertica creates or updates a spatial index it does not check polygon validity. To prevent getting invalid results when you query your spatial index, you should check the validity of your polygons prior to creating or updating your spatial index.

The following example shows you how to check the validity of polygons.

  1. Create a table and load spatial data.

    => CREATE TABLE polygon_validity_test (gid INT, geom GEOMETRY);
    CREATE TABLE
    => COPY polygon_validity_test (gid, gx FILLER LONG VARCHAR, geom AS St_GeomFromText(gx)) FROM STDIN;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 2|POLYGON((-31 74,8 70,8 50,-36 53,-31 74))
    >> 3|POLYGON((-38 50,4 13,11 45,0 65,-38 50))
    >> 4|POLYGON((-12 42,-12 42,27 48,14 26,-12 42))
    >> 5|POLYGON((0 0,1 1,0 0,2 1,1 1,0 0))
    >> 6|POLYGON((3 3,2 2,2 1,2 3,3 3))
    >> \.
    
  2. Use ST_IsValid and STV_IsValidReason to find any invalid polygons.

    => SELECT gid, ST_IsValid(geom), STV_IsValidReason(geom) FROM polygon_validity_test;
     gid | ST_IsValid |            STV_IsValidReason
    -----+------------+------------------------------------------
       4 | t          |
       6 | f          | Self-intersection at or near POINT (2 1)
       2 | t          |
       3 | t          |
       5 | f          | Self-intersection at or near POINT (0 0)
    (5 rows)
    

Now that we have identifed the invalid polygons in our table, there are a couple different ways you can handle the invalid polygons when creating or refreshing a spatial index.

Filtering invalid polygons using a WHERE clause

This method is slower than filtering before creating an index because it checks the validity of each polygon at execution time.

The following example shows you how to exclude invalid polygons using a WHERE clause.

```
=> SELECT STV_Create_Index(gid, geom USING PARAMETERS index = 'valid_polygons') OVER()
   FROM polygon_validity_test
   WHERE ST_IsValid(geom) = 't';
```

Filtering invalid polygons before creating or refreshing an index

This method is faster than filtering using a WHERE clause because you incur the performance cost prior to building the index.

The following example shows you how to exclude invalid polygons by creating a new table excluding invalid polygons.

```
=> CREATE TABLE polygon_validity_clean AS
   SELECT *
   FROM polygon_validity_test
   WHERE ST_IsValid(geom) = 't';
CREATE TABLE
=> SELECT STV_Create_Index(gid, geom USING PARAMETERS index = 'valid_polygons') OVER()
   FROM polygon_validity_clean;
```

8.3 - STV_Intersect: scalar function vs. transform function

The STV_Intersect functions are similar in purpose, but you use them differently.

The STV_Intersect functions are similar in purpose, but you use them differently.

STV_Intersect
Function Type
Description Performance
Scalar Matches a point to a polygon. If several polygons contain the point, this function returns a gid value. The result is a polygon gid or, if no polygon contains the point, the result is NULL. Eliminates points that do not intersect with any indexed polygons, avoiding unnecessary comparisons.
Transform Matches a point to all the polygons that contain it. When a point does not intersect with any polygon in the index, the function returns no rows. Processes all input points regardless of whether or not they intersect with the indexed polygons.

In the following example, the STV_Intersect scalar function compares the points in the points table to the polygons in a spatial index named my_polygons. STV_Intersect returns all points and polygons that match exactly:


=> SELECT gid AS pt_gid
   STV_Intersect(geom USING PARAMETERS index='my_polygons') AS pol_gid
   FROM points ORDER BY pt_gid;
 pt_gid | pol_gid
--------+---------
    100 |       2
    101 |
    102 |       2
    103 |
    104 |
    105 |       3
    106 |
    107 |
 (8 rows)

The following example shows how to use the STV_Intersect transform function to return information about the three point-polygon pairs that match and each of the polygons they match:


=> SELECT STV_Intersect(gid, geom
   USING PARAMETERS index='my_polygons')
   OVER (PARTITION BEST) AS (pt_gid, pol_id)
   FROM points;
 pt_gid | pol_id
--------+--------
    100 |      1
    100 |      2
    100 |      3
    102 |      2
    105 |      3
(3 rows)

See also

8.4 - Performing spatial joins with STV_Intersect functions

Suppose you want to process a medium-to-large spatial data set and determine which points intersect with which polygons.

Suppose you want to process a medium-to-large spatial data set and determine which points intersect with which polygons. In that case, first create a spatial index using STV_Create_Index. A spatial index provides efficient access to the set of polygons.

Then, use the STV_Intersect scalar or transform function to identify which point-polygon pairs match.

8.4.1 - Spatial indexes and STV_Intersect

Before performing a spatial join using one of the STV_Intersect functions, you must first run STV_Create_Index to create a database object that contains information about polygons.

Before performing a spatial join using one of the STV_Intersect functions, you must first run STV_Create_Index to create a database object that contains information about polygons. This object is called a spatial index of the set of polygons. The spatial index improves the time it takes for the STV_Intersect functions to access the polygon data.

Vertica creates spatial indexes in a global space. Thus, any user with access to the STV_*_Index functions can describe, rename, or drop indexes created by any other user.

Vertica provides functions that work with spatial indexes:

8.5 - When to use ST_Intersects vs. STV_Intersect

Vertica provides two capabilities to identify whether a set of points intersect with a set of polygons.

Vertica provides two capabilities to identify whether a set of points intersect with a set of polygons. Depending on the size of your data set, choose the approach that gives the best performance:

  • When comparing a set of geometries to a single geometry to see if they intersect, use the ST_Intersects function.

  • To determine if a set of points intersects with a set of polygons in a medium-to-large data set, first create a spatial index using STV_Create_Index. Then, use one of the STV_Intersect functions to return the set of pairs that intersect.

8.5.1 - Performing spatial joins with ST_Intersects

The ST_Intersects function determines if two GEOMETRY objects intersect or touch at a single point.

The ST_Intersects function determines if two GEOMETRY objects intersect or touch at a single point.

Use ST_Intersects when you want to identify if a small set of geometries in a column intersect with a given geometry.

Example

The following example uses ST_Intersects to compare a column of point geometries to a single polygon. The table that contains the points has 1 million rows.

ST_Intersects returns only the points that intersect with the polygon. Those points represent about 0.01% of the points in the table:

=> CREATE TABLE points_1m(gid IDENTITY, g GEOMETRY(100)) ORDER BY g;
=> COPY points_1m(wkt FILLER LONG VARCHAR(100), g AS ST_GeomFromText(wkt))
   FROM LOCAL '/data/points.dat';
 Rows Loaded
-------------
     1000000
(1 row)
=> SELECT ST_AsText(g) FROM points_1m WHERE
   ST_Intersects
     (
      g,
      ST_GeomFromText('POLYGON((-71 42, -70.9 42, -70.9 42.1, -71 42.1, -71 42))')
      );
         st_astext
----------------------------
 POINT (-70.97532 42.03538)
 POINT (-70.97421 42.0376)
 POINT (-70.99004 42.07538)
 POINT (-70.99477 42.08454)
 POINT (-70.99088 42.08177)
 POINT (-70.98643 42.07593)
 POINT (-70.98032 42.07982)
 POINT (-70.95921 42.00982)
 POINT (-70.95115 42.02177)
...
(116 rows)

Vertica recommends that you test the intersections of two columns of geometries by creating a spatial index. Use one of the STV_Intersect functions as described in STV_Intersect: scalar function vs. transform function.