This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Hadoop functions

This section contains functions to manage interactions with Hadoop.

This section contains functions to manage interactions with Hadoop.

1 - CLEAR_HDFS_CACHES

Clears the configuration information copied from HDFS and any cached connections.

Clears the configuration information copied from HDFS and any cached connections.

This function affects reads using the hdfs scheme in the following ways:

  • This function flushes information loaded from configuration files copied from Hadoop (such as core-site.xml). These files are found on the path set by the HadoopConfDir configuration parameter.

  • This function flushes information about which NameNode is active in a High Availability (HA) Hadoop cluster. Therefore, the first request to Hadoop after calling this function is slower than expected.

Vertica maintains a cache of open connections to NameNodes to reduce latency. This function flushes that cache.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

CLEAR_HDFS_CACHES ( )

Privileges

Superuser

Examples

The following example clears the Hadoop configuration information:

=> SELECT CLEAR_HDFS_CACHES();
 CLEAR_HDFS_CACHES
--------------
 Cleared
(1 row)

See also

Hadoop parameters

2 - EXTERNAL_CONFIG_CHECK

Tests the Hadoop configuration of a Vertica cluster.

Tests the Hadoop configuration of a Vertica cluster. This function tests HDFS configuration files, HCatalog Connector configuration, and Kerberos configuration.

This function calls the following functions:

If you call this function with an argument, it passes the argument to functions it calls that also take an argument.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

EXTERNAL_CONFIG_CHECK( ['what_to_test' ] )

Arguments

what_to_test
A string specifying the authorities, nameservices, and/or HCatalog schemas to test. The format is a comma-separated list of "key=value" pairs, where keys are "authority", "nameservice", and "schema". The value is passed to all of the sub-functions; see those reference pages for details on how values are interpreted.

Privileges

This function does not require privileges.

Examples

The following example tests the configuration of only the nameservice named "ns1". Output has been omitted due to length.

=> SELECT EXTERNAL_CONFIG_CHECK('nameservice=ns1');

3 - GET_METADATA

Returns the metadata of a Parquet file.

Returns the metadata of a Parquet file. Metadata includes the number and sizes of row groups, column names, and information about chunks and compression. Metadata is returned as JSON.

This function inspects one file. Parquet data usually spans many files in a single directory; choose one. The function does not accept a directory name as an argument.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_METADATA( 'filename' )

Arguments

filename
The name of a Parquet file. Any path that is valid for COPY is valid for this function. This function does not operate on files in other formats.

Privileges

Superuser, or non-superuser with READ privileges on the USER-accessible storage location (see GRANT (storage location)).

Examples

You must call this function with a single file, not a directory or glob:

=> SELECT GET_METADATA('/data/emp-row.parquet');
                GET_METADATA
----------------------------------------------------------------------------------------------------
 schema:
required group field_id=-1 spark_schema {
  optional int32 field_id=-1 employeeID;
  optional group field_id=-1 personal {
    optional binary field_id=-1 name (String);
    optional group field_id=-1 address {
      optional binary field_id=-1 street (String);
      optional binary field_id=-1 city (String);
      optional int32 field_id=-1 zipcode;
    }
    optional int32 field_id=-1 taxID;
  }
  optional binary field_id=-1 department (String);
}

 data page version:
  data page v1

 metadata:
{
  "FileName": "/data/emp-row.parquet",
  "FileFormat": "Parquet",
  "Version": "1.0",
  "CreatedBy": "parquet-mr version 1.10.1 (build a89df8f9932b6ef6633d06069e50c9b7970bebd1)",
  "TotalRows": "4",
  "NumberOfRowGroups": "1",
  "NumberOfRealColumns": "3",
  "NumberOfColumns": "7",
  "Columns": [
     { "Id": "0", "Name": "employeeID", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} },
     { "Id": "1", "Name": "personal.name", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} },
     { "Id": "2", "Name": "personal.address.street", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} },
     { "Id": "3", "Name": "personal.address.city", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} },
     { "Id": "4", "Name": "personal.address.zipcode", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} },
     { "Id": "5", "Name": "personal.taxID", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} },
     { "Id": "6", "Name": "department", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} }
  ],
  "RowGroups": [
     {
       "Id": "0",  "TotalBytes": "642",  "TotalCompressedBytes": "0",  "Rows": "4",
       "ColumnChunks": [
          {"Id": "0", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "51513", "Min": "17103" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "67", "CompressedSize": "69" },
          {"Id": "1", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "Sheldon Cooper", "Min": "Howard Wolowitz" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "142", "CompressedSize": "145" },
          {"Id": "2", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "52 Broad St", "Min": "100 Main St Apt 4A" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "139", "CompressedSize": "123" },
          {"Id": "3", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "Pasadena", "Min": "Pasadena" },
           "Compression": "SNAPPY", "Encodings": "RLE PLAIN_DICTIONARY BIT_PACKED ", "UncompressedSize": "95", "CompressedSize": "99" },
          {"Id": "4", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "91021", "Min": "91001" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "68", "CompressedSize": "70" },
          {"Id": "5", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "4", "DistinctValues": "0", "Max": "0", "Min": "0" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "28", "CompressedSize": "30" },
          {"Id": "6", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "Physics", "Min": "Astronomy" },
           "Compression": "SNAPPY", "Encodings": "RLE PLAIN_DICTIONARY BIT_PACKED ", "UncompressedSize": "103", "CompressedSize": "107" }
        ]
     }
  ]
}

(1 row)

4 - HADOOP_IMPERSONATION_CONFIG_CHECK

Reports the delegation tokens Vertica will use when accessing Kerberized data in HDFS.

Reports the delegation tokens Vertica will use when accessing Kerberized data in HDFS. The HadoopImpersonationConfig configuration parameter specifies one or more authorities, nameservices, and HCatalog schemas and their associated tokens. For each tested value, the function reports what doAs user or delegation token Vertica will use for access. Use this function to confirm that you have defined your delegation tokens as you intended.

You can call this function with an argument to specify the authority, nameservice, or HCatalog schema to test, or without arguments to test all configured values.

This function does not check that you can use these delegation tokens to access HDFS.

See Proxy users and delegation tokens for more about impersonation.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

HADOOP_IMPERSONATION_CONFIG_CHECK( ['what_to_test' ] )

Arguments

what_to_test
A string specifying the authorities, nameservices, and/or HCatalog schemas to test. For example, a value of 'nameservice=ns1' means the function tests only access to the nameservice "ns1" and ignores any other authorities and schemas. A value of 'nameservice=ns1, schema=hcat1' means the function tests one nameservice and one HCatalog schema.

If you do not specify this argument, the function tests all authorities, nameservices, and schemas defined in HadoopImpersonationConfig .

Privileges

This function does not require privileges.

Examples

Consider the following definition of HadoopImpersonationConfig:

[{
        "nameservice": "ns1",
        "token": "RANDOM-TOKEN-STRING"
    },
    {
        "nameservice": "*",
        "doAs": "Paul"
    },
    {
        "schema": "hcat1",
        "doAs": "Fred"
    }
]

The following query tests only the "ns1" name service:

=> SELECT HADOOP_IMPERSONATION_CONFIG_CHECK('nameservice=ns1');

-- hadoop_impersonation_config_check --
Connections to nameservice [ns1] will use a delegation token with hash [b3dd9e71cd695d91]

This function returns a hash of the token for security reasons. You can call HASH_EXTERNAL_TOKEN with the expected value and compare that hash to the one in this function's output.

A query with no argument tests all values:

=> SELECT HADOOP_IMPERSONATION_CONFIG_CHECK();

-- hadoop_impersonation_config_check --
Connections to nameservice [ns1] will use a delegation token with hash [b3dd9e71cd695d91]
JDBC connections for HCatalog schema [hcat1] will doAs [Fred]
[!] hadoop_impersonation_config_check : [PASS]

5 - HASH_EXTERNAL_TOKEN

Returns a hash of a string token, for use with HADOOP_IMPERSONATION_CONFIG_CHECK.

Returns a hash of a string token, for use with HADOOP_IMPERSONATION_CONFIG_CHECK. Call HASH_EXTERNAL_TOKEN with the delegation token you expect Vertica to use and compare it to the hash in the output of HADOOP_IMPERSONATION_CONFIG_CHECK.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

HASH_EXTERNAL_TOKEN( 'token' )

Arguments

token
A string specifying the token to hash. The token is configured in the HadoopImpersonationConfig parameter.

Privileges

This function does not require privileges.

Examples

The following query tests the expected value shown in the example on the HADOOP_IMPERSONATION_CONFIG_CHECK reference page.

=> SELECT HASH_EXTERNAL_TOKEN('RANDOM-TOKEN-STRING');
hash_external_token
---------------------
b3dd9e71cd695d91
(1 row)

6 - HCATALOGCONNECTOR_CONFIG_CHECK

Tests the configuration of a Vertica cluster that uses the HCatalog Connector to access Hive data.

Tests the configuration of a Vertica cluster that uses the HCatalog Connector to access Hive data. The function first verifies that the HCatalog Connector is properly installed and reports on the values of several related configuration parameters. It then tests the connection using HiveServer2. This function does not support the WebHCat server.

If you specify an HCatalog schema, and if you have defined a delegation token for that schema, this function uses the delegation token. Otherwise, the function uses the default endpoint without a delegation token.

See Proxy users and delegation tokens for more about delegation tokens.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

HCATALOGCONNECTOR_CONFIG_CHECK( ['what_to_test' ] )

Arguments

what_to_test
A string specifying the HCatalog schemas to test. For example, a value of 'schema=hcat1' means the function tests only the "hcat1" schema and ignores any others that are found.

Privileges

This function does not require privileges.

Examples

The following query tests with the default endpoint and no delegation token.

=> SELECT HCATALOGCONNECTOR_CONFIG_CHECK();

-- hcatalogconnector_config_check --

    HCatalogConnectorUseHiveServer2 : [1]
    EnableHCatImpersonation : [1]
    HCatalogConnectorUseORCReader : [1]
    HCatalogConnectorUseParquetReader : [1]
    HCatalogConnectorUseTxtReader : [0]
  [INFO] Vertica is not configured to use its internal parsers for delimited files.
  [INFO] This is off by default, but will be changed in a future release.
    HCatalogConnectorUseLibHDFSPP : [1]

  [OK] HCatalog connector library is properly installed.
  [INFO] Creating JDBC connection as session user.
  [OK] Successful JDBC connection to HiveServer2 as user [USER].

  [!] hcatalogconnector_config_check : [PASS]

To test with the configured delegation token, pass the schema as an argument:

=> SELECT HCATALOGCONNECTOR_CONFIG_CHECK('schema=hcat1');

7 - HDFS_CLUSTER_CONFIG_CHECK

Tests the configuration of a Vertica cluster that uses HDFS.

Tests the configuration of a Vertica cluster that uses HDFS. The function scans the Hadoop configuration files found in HadoopConfDir and performs configuration checks on each cluster it finds. If you have more than one cluster configured, you can specify which one to test instead of testing all of them.

For each Hadoop cluster, it reports properties including:

  • Nameservice name and associated NameNodes

  • High-availability status

  • RPC encryption status

  • Kerberos authentication status

  • HTTP(S) status

It then tests connections using http(s), hdfs, and webhdfs URL schemes. It tests the latter two using both the Vertica and session user.

See Configuring HDFS access for information about configuration files and HadoopConfDir.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

HDFS_CLUSTER_CONFIG_CHECK( ['what_to_test' ] )

Arguments

what_to_test
A string specifying the authorities or nameservices to test. For example, a value of 'nameservice=ns1' means the function tests only "ns1" cluster. If you specify both an authority and a nameservice, the authority must be a NameNode in the specified nameservice for the check to pass.

If you do not specify this argument, the function tests all cluster configurations found in HadoopConfDir.

Privileges

This function does not require privileges.

Examples

The following example tests all clusters.

=> SELECT HDFS_CLUSTER_CONFIG_CHECK();

-- hdfs_cluster_config_check --

    Hadoop Conf Path : [/conf/hadoop_conf]
  [OK] HadoopConfDir verified on all nodes
    Connection Timeout (seconds) : [60]
    Token Refresh Frequency (seconds) : [0]
    HadoopFSBlockSizeBytes (MiB) : [64]

  [OK] Found [1] hadoop cluster configurations

------------- Cluster 1 -------------
    Is DefaultFS : [true]
    Nameservice : [vmns]
    Namenodes : [node1.example.com:8020, node2.example.com:8020]
    High Availability : [true]
    RPC Encryption : [false]
    Kerberos Authentication : [true]
    HTTPS Only : [false]
  [INFO] Checking connections to [hdfs:///]
    vertica : [OK]
    dbuser : [OK]

  [INFO] Checking connections to [http://node1.example.com:50070]
  [INFO] Node is in standby
  [INFO] Checking connections to [http://node2.example.com:50070]
  [OK] Can make authenticated external curl connection
  [INFO] Checking webhdfs
    vertica : [OK]
    USER : [OK]

  [!] hdfs_cluster_config_check : [PASS]

8 - KERBEROS_HDFS_CONFIG_CHECK

This function is deprecated and will be removed in a future release.

Tests the Kerberos configuration of a Vertica cluster that uses HDFS. The function succeeds if it can use both the Vertica keytab file and the session user to access HDFS, and reports errors otherwise. This function is a more specific version of KERBEROS_CONFIG_CHECK.

If the current session is not Kerberized, this function will not be able to use secured HDFS connections and will fail.

You can call this function with arguments to specify an HDFS configuration to test, or without arguments. If you call it with no arguments, this function reads the HDFS configuration files and fails if it does not find them. See Configuring HDFS access. If it finds configuration files, it tests all configured nameservices.

The function performs the following tests, in order:

  • Are Kerberos services available?

  • Does a keytab file exist and are the Kerberos and HDFS configuration parameters set in the database?

  • Can Vertica read and invoke kinit with the keys to authenticate to HDFS and obtain the database Kerberos ticket?

  • Can Vertica perform hdfs and webhdfs operations using both the database Kerberos ticket and user-forwardable tickets for the current session?

  • Can Vertica connect to HiveServer2? (This function does not support WebHCat.)

If any test fails, the function returns a descriptive error message.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

KERBEROS_HDFS_CONFIG_CHECK( ['hdfsHost:hdfsPort',
  'webhdfsHost:webhdfsPort', 'webhcatHost' ] )

Arguments

hdfsHost, hdfsPort
The hostname or IP address and port of the HDFS NameNode. Vertica uses this server to access data that is specified with hdfs URLs. If the value is ' ', the function skips this part of the check.
webhdfsHost, webhdfsPort
The hostname or IP address and port of the WebHDFS server. Vertica uses this server to access data that is specified with webhdfs URLs. If the value is ' ', the function skips this part of the check.
webhcatHost
Pass any value in this position. WebHCat is deprecated and this value is ignored but must be present.

Privileges

This function does not require privileges.

9 - SYNC_WITH_HCATALOG_SCHEMA

Copies the structure of a Hive database schema available through the HCatalog Connector to a Vertica schema.

Copies the structure of a Hive database schema available through the HCatalog Connector to a Vertica schema. If the HCatalog schema and the target Vertica schema have matching table names, SYNC_WITH_HCATALOG_SCHEMA overwrites the Vertica tables.

This function can synchronize the HCatalog schema directly. In this case, call it with the same schema name for the vertica_schema and hcatalog_schema parameters. The function can also synchronize a different schema to the HCatalog schema.

If you change the settings of HCatalog Connector configuration parameters, you must call this function again.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

SYNC_WITH_HCATALOG_SCHEMA( vertica_schema, hcatalog_schema, [drop_non_existent] )

Parameters

vertica_schema
The target Vertica schema to store the copied HCatalog schema's metadata. This can be the same schema as hcatalog_schema, or it can be a separate one created with CREATE SCHEMA.
hcatalog_schema
The HCatalog schema to copy, created with CREATE HCATALOG SCHEMA
drop_non_existent
If true, drop any tables in vertica_schema that do not correspond to a table in hcatalog_schema

Privileges

Non-superuser: CREATE privileges on vertica_schema.

Users also require access to Hive data, one of the following:

  • USAGE permissions on hcat_schema, if Hive does not use an authorization service to manage access.

  • Permission through an authorization service (Sentry or Ranger), and access to the underlying files in HDFS. (Sentry can provide that access through ACL synchronization.)

  • dbadmin user privileges, with or without an authorization service.

Data type matching

Hive STRING and BINARY data types are matched, in Vertica, to the VARCHAR(65000) and VARBINARY(65000) types. Adjust the data types with ALTER TABLE as needed after creating the schema. The maximum size of a VARCHAR or VARBINARY in Vertica is 65000, but you can use LONG VARCHAR and LONG VARBINARY to specify larger values.

Hive and Vertica define string length in different ways. In Hive the length is the number of characters; in Vertica it is the number of bytes. Thus, a character encoding that uses more than one byte, such as Unicode, can cause mismatches between the two. To avoid data truncation, set values in Vertica based on bytes, not characters.

If data size exceeds the column size, Vertica logs an event at read time in the QUERY_EVENTS system table.

Examples

The following example uses SYNC_WITH_HCATALOG_SCHEMA to synchronize an HCatalog schema named hcat:

=> CREATE HCATALOG SCHEMA hcat WITH hostname='hcathost' HCATALOG_SCHEMA='default'
   HCATALOG_USER='hcatuser';
CREATE SCHEMA
=> SELECT sync_with_hcatalog_schema('hcat', 'hcat');
sync_with_hcatalog_schema
----------------------------------------
Schema hcat synchronized with hcat
tables in hcat = 56
tables altered in hcat = 0
tables created in hcat = 56
stale tables in hcat = 0
table changes erred in hcat = 0
(1 row)

=> -- Use vsql's \d command to describe a table in the synced schema

=> \d hcat.messages
List of Fields by Tables
  Schema   |   Table  | Column  |      Type      | Size  | Default | Not Null | Primary Key | Foreign Key
-----------+----------+---------+----------------+-------+---------+----------+-------------+-------------
hcat       | messages | id      | int            |     8 |         | f        | f           |
hcat       | messages | userid  | varchar(65000) | 65000 |         | f        | f           |
hcat       | messages | "time"  | varchar(65000) | 65000 |         | f        | f           |
hcat       | messages | message | varchar(65000) | 65000 |         | f        | f           |
(4 rows)

The following example uses SYNC_WITH_HCATALOG_SCHEMA followed by ALTER TABLE to adjust a column value:

=> CREATE HCATALOG SCHEMA hcat WITH hostname='hcathost' HCATALOG_SCHEMA='default'
-> HCATALOG_USER='hcatuser';
CREATE SCHEMA
=> SELECT sync_with_hcatalog_schema('hcat', 'hcat');
...
=> ALTER TABLE hcat.t ALTER COLUMN a1 SET DATA TYPE long varchar(1000000);
=> ALTER TABLE hcat.t ALTER COLUMN a2 SET DATA TYPE long varbinary(1000000);

The following example uses SYNC_WITH_HCATALOG_SCHEMA with a local (non-HCatalog) schema:

=> CREATE HCATALOG SCHEMA hcat WITH hostname='hcathost' HCATALOG_SCHEMA='default'
-> HCATALOG_USER='hcatuser';
CREATE SCHEMA
=> CREATE SCHEMA hcat_local;
CREATE SCHEMA
=> SELECT sync_with_hcatalog_schema('hcat_local', 'hcat');

10 - SYNC_WITH_HCATALOG_SCHEMA_TABLE

Copies the structure of a single table in a Hive database schema available through the HCatalog Connector to a Vertica table.

Copies the structure of a single table in a Hive database schema available through the HCatalog Connector to a Vertica table.

This function can synchronize the HCatalog schema directly. In this case, call it with the same schema name for the vertica_schema and hcatalog_schema parameters. The function can also synchronize a different schema to the HCatalog schema.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

SYNC_WITH_HCATALOG_SCHEMA_TABLE( vertica_schema, hcatalog_schema, table_name )

Parameters

vertica_schema
The existing Vertica schema to store the copied HCatalog schema's metadata. This can be the same schema as hcatalog_schema, or it can be a separate one created with CREATE SCHEMA.
hcatalog_schema
The HCatalog schema to copy, created with CREATE HCATALOG SCHEMA.
table_name
The table in hcatalog_schema to copy. If table_name already exists in vertica_schema, the function overwrites it.

Privileges

Non-superuser: CREATE privileges on vertica_schema.

Users also require access to Hive data, one of the following:

  • USAGE permissions on hcat_schema, if Hive does not use an authorization service to manage access.

  • Permission through an authorization service (Sentry or Ranger), and access to the underlying files in HDFS. (Sentry can provide that access through ACL synchronization.)

  • dbadmin user privileges, with or without an authorization service.

Data type matching

Hive STRING and BINARY data types are matched, in Vertica, to the VARCHAR(65000) and VARBINARY(65000) types. Adjust the data types with ALTER TABLE as needed after creating the schema. The maximum size of a VARCHAR or VARBINARY in Vertica is 65000, but you can use LONG VARCHAR and LONG VARBINARY to specify larger values.

Hive and Vertica define string length in different ways. In Hive the length is the number of characters; in Vertica it is the number of bytes. Thus, a character encoding that uses more than one byte, such as Unicode, can cause mismatches between the two. To avoid data truncation, set values in Vertica based on bytes, not characters.

If data size exceeds the column size, Vertica logs an event at read time in the QUERY_EVENTS system table.

Examples

The following example uses SYNC_WITH_HCATALOG_SCHEMA_TABLE to synchronize the "nation" table:

=> CREATE SCHEMA 'hcat_local';
CREATE SCHEMA

=> CREATE HCATALOG SCHEMA hcat WITH hostname='hcathost' HCATALOG_SCHEMA='hcat'
   HCATALOG_USER='hcatuser';
CREATE SCHEMA

=> SELECT sync_with_hcatalog_schema_table('hcat_local', 'hcat', 'nation');
sync_with_hcatalog_schema_table
-----------------------------------------------------------------------------
    Schema hcat_local synchronized with hcat for table nation
    table nation is created in schema hcat_local
    (1 row)

The following example shows the behavior if the "nation" table already exists in the local schema:

=> SELECT sync_with_hcatalog_schema_table('hcat_local','hcat','nation');
sync_with_hcatalog_schema_table
-----------------------------------------------------------------------------
    Schema hcat_local synchronized with hcat for table nation
    table nation is altered in schema hcat_local
    (1 row)

11 - VERIFY_HADOOP_CONF_DIR

Verifies that the Hadoop configuration that is used to access HDFS is valid on all Vertica nodes.

Verifies that the Hadoop configuration that is used to access HDFS is valid on all Vertica nodes. The configuration is valid if:

  • all required configuration files are found on the path defined by the HadoopConfDir configuration parameter

  • all properties needed by Vertica are set in those files

This function does not attempt to validate the settings of those properties; it only verifies that they have values.

It is possible for Hadoop configuration to be valid on some nodes and invalid on others. The function reports a validation failure if the value is invalid on any node; the rest of the output reports the details.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

VERIFY_HADOOP_CONF_DIR( )

Parameters

This function has no parameters.

Privileges

This function does not require privileges.

Examples

The following example shows the results when the Hadoop configuration is valid.

=> SELECT VERIFY_HADOOP_CONF_DIR();
    verify_hadoop_conf_dir
-------------------------------------------------------------------
Validation Success
v_vmart_node0001: HadoopConfDir [PG_TESTOUT/config] is valid
v_vmart_node0002: HadoopConfDir [PG_TESTOUT/config] is valid
v_vmart_node0003: HadoopConfDir [PG_TESTOUT/config] is valid
v_vmart_node0004: HadoopConfDir [PG_TESTOUT/config] is valid
    (1 row)

In the following example, the Hadoop configuration is valid on one node, but on other nodes a needed value is missing.

=> SELECT VERIFY_HADOOP_CONF_DIR();
    verify_hadoop_conf_dir
-------------------------------------------------------------------
Validation Failure
v_vmart_node0001: HadoopConfDir [PG_TESTOUT/test_configs/config] is valid
v_vmart_node0002: No fs.defaultFS parameter found in config files in [PG_TESTOUT/config]
v_vmart_node0003: No fs.defaultFS parameter found in config files in [PG_TESTOUT/config]
v_vmart_node0004: No fs.defaultFS parameter found in config files in [PG_TESTOUT/config]
    (1 row)