Computing flex table keys
After loading data into a flex table, you can determine the set of keys that exist in the __raw__
column (the map data). Two helper functions compute keys from flex table map data:
-
COMPUTE_FLEXTABLE_KEYS determines which keys exist as virtual columns in the flex map.
-
COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW performs the same functionality as COMPUTE_FLEXTABLE_KEYS, additionally building a new view. See also Updating Flex Table Views.
Note
If the length of a key exceeds 65,000, Vertica truncates the key.Using COMPUTE_FLEXTABLE_KEYS
During execution, this function calculates the following information for the flex keys table columns:
Column | Description |
---|---|
key_name |
The name of the virtual column (key). |
frequency |
The number of times the key occurs in the map. |
data_type_guess |
The type guess that the helper functions determine for each distinct key in the map data. The function determines the type of each non-string value, depending on the length of the key, and whether the key includes nested maps. Changing the default value of the EnableBetterFlexTypeGuessing configuration parameter to 0 (OFF) results in the functions determining all flex table keys as string types ([LONG]VARCHAR ) or ([LONG] VARBINARY ). |
Determining key data types
By default, using COMPUTE_FLEXTABLE_KEYS
determines non-string key values from the __raw__
column LONG VARBINARY type. The non-string keys include these data types (and others listed in Data types):
-
BOOLEAN
-
INTEGER
-
FLOAT
-
TIMESTAMP
-
DATE
Assigning flex key data types
Use the sample CSV data in this section to compare the results of using or not using the EnableBetterFlexTypeGuessing
configuration parameter. When the parameter is ON, the function determines key non-string data types in your map data more accurately. The default for the parameter is 1 (ON).
Year,Quarter,Region,Species,Grade,Pond Value,Number of Quotes,Available
2015,1,2 - Northwest Oregon & Willamette,Douglas-fir,1P,$615.12 ,12,No
2015,1,2 - Northwest Oregon & Willamette,Douglas-fir,SM,$610.78 ,12,Yes
2015,1,2 - Northwest Oregon & Willamette,Douglas-fir,2S,$596.00 ,20,Yes
2015,1,2 - Northwest Oregon & Willamette,Hemlock,P,$520.00 ,6,Yes
2015,1,2 - Northwest Oregon & Willamette,Hemlock,SM,$510.00 ,6,No
2015,1,2 - Northwest Oregon & Willamette,Hemlock,2S,$490.00 ,14,No
To compare the data type assignment results, complete the following steps:
-
Save the CSV data file (here, as
trees.csv
). -
Create a flex table (
trees
) and loadtrees.csv
using thefcsvparser
:=> CREATE FLEX TABLE trees(); => COPY trees FROM '/home/dbadmin/tempdat/trees.csv' PARSER fcsvparser();
-
Use
COMPUTE_FLEXTABLE_KEYS
with thetrees
flex table.=> SELECT COMPUTE_FLEXTABLE_KEYS('trees'); COMPUTE_FLEXTABLE_KEYS ----------------------------------------------- Please see public.trees_keys for updated keys (1 row)
-
Query the
trees_keys
table output.:=> SELECT * FROM trees_keys; key_name | frequency | data_type_guess ------------------+-----------+----------------- Year | 6 | Integer Quarter | 6 | Integer Region | 6 | Varchar(66) Available | 6 | Boolean Number of Quotes | 6 | Integer Grade | 6 | Varchar(20) Species | 6 | Varchar(22) Pond Value | 6 | Numeric(8,3) (8 rows)
-
Set the
EnableBetterFlexTypeGuessing
parameter to 0 (OFF). -
Call
COMPUTE_FLEXTABLE_KEYS
with thetrees
flex table again. -
Query the
trees_keys
table to compare thedata_type_guess
values with the previous results. Without the configuration parameter set, all of the non-string data types are VARCHARS of various lengths:=> SELECT * FROM trees_keys; key_name | frequency | data_type_guess ------------------+-----------+----------------- Year | 6 | varchar(20) Quarter | 6 | varchar(20) Region | 6 | varchar(66) Available | 6 | varchar(20) Grade | 6 | varchar(20) Number of Quotes | 6 | varchar(20) Pond Value | 6 | varchar(20) Species | 6 | varchar(22) (8 rows)
-
To maintain accurate results for non-string data types, set the
EnableBetterFlexTypeGuessing
parameter back to 1 (ON).
For more information about the EnableBetterFlexTypeGuessing
configuration parameter, see EnableBetterFlexTypeGuessing.
Calculating key value column widths
The COMPUTE_FLEXTABLE_KEYS
function determines the column width for keys by determining the length of the largest value for each key, multiplied by the FlexTableDataTypeGuessMultiplier factor.
The next example shows the results of populating the _keys table after creating a flex table (darkdata1
) and loading data. The column widths are shown in parentheses, where applicable, after the value of the data_type_guess
column:
=> SELECT compute_flextable_keys('darkdata1');
compute_flextable_keys
--------------------------------------------------
Please see public.darkdata1_keys for updated keys
(1 row)
=> SELECT * from darkdata1_keys;
key_name | frequency | data_type_guess
----------------------------------------------------------+-----------+----------------------
created_at | 8 | TimestampTz
delete.status.id_str | 4 | Integer
delete.status.user_id | 4 | Integer
entities.hashtags | 8 | long varbinary(186)
favorited | 8 | Boolean
id_str | 8 | Integer
in_reply_to_screen_name | 8 | Varchar(24)
retweeted_status.contributors | 1 | Varchar(20)
retweeted_status.coordinates | 1 | Varchar(20)
retweeted_status.created_at | 1 | TimestampTz
.
.
.
(125 rows)