入门

以下教程演示了创建、浏览和使用 Flex 表的基本信息。

将这些 JSON 数据用于本节其余部分的练习:

{"name": "Everest", "type":"mountain", "height":29029, "hike_safety": 34.1}
{"name": "Mt St Helens", "type":"volcano", "height":29029, "hike_safety": 15.4}
{"name": "Denali", "type":"mountain", "height":17000, "hike_safety": 12.2}
{"name": "Kilimanjaro", "type":"mountain", "height":14000 }
{"name": "Mt Washington", "type":"mountain", "hike_safety": 50.6}

创建 Flex 表并加载数据

  1. 创建名为 mountains 的 Flex 表:

    => CREATE flex table mountains();
    
  2. 使用 Flex 表解析器 fjsonparser 加载您保存的 JSON 文件:

    => COPY mountains from '/home/dbadmin/data/flex/mountains.json'
    parser fjsonparser();
     Rows Loaded
    -------------
               5
    (1 row)
    
  3. 查询示例文件中的值:

    => SELECT name, type, height from mountains;
         name      |   type   | height
    ---------------+----------+--------
     Everest       | mountain | 29029
     Mt St Helens  | volcano  | 29029
     Denali        | mountain | 17000
     Kilimanjaro   | mountain | 14000
     Mt Washington | mountain |
    (5 rows)
    

现在,您已经创建一个 Flex 表并加载数据。接下来,了解更多有关在数据库中使用 Flex 表数据的信息。

更多地查询 Flex 表

  1. 查询 Flex 表以查看您加载的数据,它们存储在 __raw__ 列中。示例说明了表的内容,并为了便于说明而添加了换行符:

    => \x
    Expanded display is on.
    => SELECT * from mountains;
    [ RECORD 1 ]+----------------------------------------------------------------------------
    __identity__ | 1
    __raw__      | \001\000\000\000,\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
    035\000\000\000$\000\000\0002902934.1Everestmountain\004\000\000\000\024\000\000\000\032\000\
    000\000%\000\000\000)\000\000\000heighthike_safetynametype
    [ RECORD 2 ]+----------------------------------------------------------------------------
    __identity__ | 2
    __raw__      | \001\000\000\0000\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
    035\000\000\000)\000\000\0002902915.4Mt St Helensvolcano\004\000\000\000\024\000\000\000\032\000\
    000\000%\000\000\000)\000\000\000heighthike_safetynametype
    [ RECORD 3 ]+----------------------------------------------------------------------------
    __identity__ | 3
    __raw__      | \001\000\000\000+\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
    035\000\000\000#\000\000\0001700012.2Denalimountain\004\000\000\000\024\000\000\000\032\000\000
    \000%\000\000\000)\000\000\000heighthike_safetynametype
    [ RECORD 4 ]+----------------------------------------------------------------------------
    __identity__ | 4
    __raw__      | \001\000\000\000(\000\000\000\003\000\000\000\020\000\000\000\025\000\000\000\
    000\000\00014000Kilimanjaromountain\003\000\000\000\020\000\000\000\026\000\000\000\032\000\
    000\000heightnametype
    [ RECORD 5 ]+----------------------------------------------------------------------------
    __identity__ | 5
    __raw__      | \001\000\000\000)\000\000\000\003\000\000\000\020\000\000\000\024\000\000\000\
    000\000\00050.6Mt Washingtonmountain\003\000\000\000\020\000\000\000\033\000\000\000\037\000\
    000\000hike_safetynametype
    
  2. 使用 mapToString() 函数(用于 mountains__raw__ 列)检查其采用可读 JSON 文本格式的内容:

    => SELECT maptostring(__raw__) from mountains;
                                             MAPTOSTRING
    ------------------------------------------------------------------------------------------
      {
       "height" : "29029",
       "hike_safety" : "34.1",
       "name" : "Everest",
       "type" : "mountain"
    }
    
     {
       "height" : "29029",
       "hike_safety" : "15.4",
       "name" : "Mt St Helens",
       "type" : "volcano"
    }
    
     {
       "height" : "17000",
       "hike_safety" : "12.2",
       "name" : "Denali",
       "type" : "mountain"
    }
    
     {
       "height" : "14000",
       "name" : "Kilimanjaro",
       "type" : "mountain"
    }
    
     {
       "hike_safety" : "50.6",
       "name" : "Mt Washington",
       "type" : "mountain"
    }
    
  3. 现在,使用 compute_flextable_keys() 函数填充 mountain_keys 表。Vertica 在您创建 Flex 表时自动生成此表。

    => SELECT compute_flextable_keys('mountains');
        compute_flextable_keys
    ---------------------------------------------------
    Please see public.mountains_keys for updated keys
    (1 row)
    
  4. 查询键表 (mountains_keys),并检查结果:

    
    => SELECT * from public.mountains_keys;
      key_name   | frequency | data_type_guess
    -------------+-----------+-----------------
     hike_safety |         4 | varchar(20)
     name        |         5 | varchar(26)
     height      |         4 | varchar(20)
     type        |         5 | varchar(20)
    (4 rows)
    

构建 Flex 表视图

  1. 使用 build_flextable_view() 函数填充从 mountains_keys 表生成的视图。

    => SELECT build_flextable_view('mountains');
                     build_flextable_view
    ------------------------------------------------------
     The view public.mountains_view is ready for querying
    (1 row)
    
  2. 查询视图 mountains_view

    
    => SELECT * from public.mountains_view;
     hike_safety |     name      |   type   | height
    -------------+---------------+----------+--------
     50.6        | Mt Washington | mountain |
     34.1        | Everest       | mountain | 29029
     22.8        | Kilimanjaro   | mountain | 14000
     15.4        | Mt St Helens  | volcano  | 29029
     12.2        | Denali        | mountain | 17000
    (5 rows)
    
  3. 使用 view_columns 系统表查询 mountains_viewcolumn_name 列和 data_type 列:

    => SELECT column_name, data_type from view_columns where table_name = 'mountains_view';
     column_name |  data_type
    -------------+-------------
     hike_safety | varchar(20)
     name        | varchar(26)
     type        | varchar(20)
     height      | varchar(20)
    (4 rows)
    
  4. 审阅查询结果:

    • 请注意 data_type 列及其值和大小。这些均为在您使用 compute_flextable_keys() 计算 Flex 表键时计算得出。

    • 您在调用该函数后查询 mountains_keys 表时,是否还注意到 data_type_guess 列?

  5. 使用来自 mountains_viewdata_type 信息,覆盖 hike_safetydata_type_guess。然后,COMMIT 更改,并使用 build_flextable_view() 重新构建视图:

    => UPDATE mountains_keys SET data_type_guess = 'float' where key_name = 'hike_safety';
     OUTPUT
    --------
          1
    (1 row)
    
    => commit;
    => SELECT build_flextable_view('mountains');
                     build_flextable_view
    ------------------------------------------------------
     The view public.mountains_view is ready for querying
    (1 row)
    
  6. 接下来,使用 view_columns 系统表。请注意,hike_safety 现在属于 float 数据类型:

    => SELECT column_name, data_type from view_columns where table_name = 'mountains_view';
     column_name |  data_type
    -------------+-------------
     hike_safety | float
     name        | varchar(26)
     type        | varchar(20)
     height      | varchar(20)
    (4 rows)
    

创建混合 Flex 表

如果您已经知道您定期加载和查询的部分数据需要全面的 Vertica 性能和支持,您可以创建一个混合 Flex 表。混合 Flex 表具有您定义的一个或多个实际列,以及一个用于存储您加载的所有非结构化数据的 __raw__ 列。查询实际列比查询 raw 列中的可变数据更快。您可以定义各列的默认值。

  1. 创建一个混合 Flex 表,并加载相同的示例 JSON 文件:

    => CREATE flex table mountains_hybrid(name varchar(41) default name::varchar(41), hike_safety float
    default hike_safety::float);
    => COPY mountains_hybrid from '/home/dbadmin/Downloads/mountains.json' parser fjsonparser();
     Rows Loaded
    -------------
               5
    (1 row)
    
  2. 使用 compute_flextable_keys_and_build_view() 函数为 mountains_hybrid 填充键表并构建视图:

    => SELECT compute_flextable_keys_and_build_view('mountains_hybrid');
                                            compute_flextable_keys_and_build_view
    ------------------------------------------------------------------------------------------------
     Please see public.mountains_hybrid_keys for updated keys
    The view public.mountains_hybrid_view is ready for querying
    (1 row)
    
  3. 查询 mountains_hybrid 键表。再次审阅 data_type_guesses 列的值。类型列出了您在创建混合表时声明的列定义:

    => SELECT * from mountains_hybrid_keys;
      key_name   | frequency | data_type_guess
    -------------+-----------+-----------------
     height      |         4 | varchar(20)
     name        |         5 | varchar(41)
     type        |         5 | varchar(20)
     hike_safety |         4 | float
    (4 rows)
    

如果您创建了一个基本 Flex 表,随后发现您希望将一个或多个虚拟列提升为实际列,请参阅实体化 Flex 表以添加列。

实体化混合 Flex 表中的虚拟列

浏览 Flex 表数据后,您可以将 Flex 表中的一个或多个虚拟列提升为实际列。您无需创建单独的列式表。

  1. 在混合表上调用 materialize_flextable_columns() 函数,指定要实体化的虚拟列数量:

    => SELECT materialize_flextable_columns('mountains_hybrid', 3);
                                     materialize_flextable_columns
    --------------------------------------------------------------------------------------------
     The following columns were added to the table public.mountains_hybrid:
            type
    For more details, run the following query:
    SELECT * FROM v_catalog.materialize_flextable_columns_results WHERE
    table_schema = 'public' and table_name = 'mountains_hybrid';
    
    (1 row)
    
  2. 您指定要实体化三 (3) 列,但该表是使用两个实际列(namehike_safety)创建的。为了满足您指定三列的愿望,该函数仅再提升另外一列 type。下一个示例已展开显示,以垂直列出各列。请注意,刚实体化的列处于 ADDED 状态,而不是您在创建该表时所定义的两列所处的 EXISTS 状态:

    => \x
    Expanded display is on.
    => SELECT * from materialize_flextable_columns_results where table_name = 'mountains_hybrid';
    -[ RECORD 1 ]-+-------------------------------------------------------
    table_id      | 45035996273766044
    table_schema  | public
    table_name    | mountains_hybrid
    creation_time | 2013-11-30 20:09:37.765257-05
    key_name      | type
    status        | ADDED
    message       | Added successfully
    -[ RECORD 2 ]-+-------------------------------------------------------
    table_id      | 45035996273766044
    table_schema  | public
    table_name    | mountains_hybrid
    creation_time | 2013-11-30 20:09:37.765284-05
    key_name      | hike_safety
    status        | EXISTS
    message       | Column of same name already exists in table definition
    -[ RECORD 3 ]-+-------------------------------------------------------
    table_id      | 45035996273766044
    table_schema  | public
    table_name    | mountains_hybrid
    creation_time | 2013-11-30 20:09:37.765296-05
    key_name      | name
    status        | EXISTS
    message       | Column of same name already exists in table definition
    
  3. 现在,显示混合表定义,列出 __raw__ 列和三个已实体化的列。Flex 表的数据类型由相关联的键表派生而来,因此您可以在必要时对其进行更新。请注意,__raw__ 列具有默认的 NOT NULL 约束:

    => \d mountains_hybrid
    List of Fields by Tables
    -[ RECORD 1 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | __raw__
    Type        | long varbinary(130000)
    Size        | 130000
    Default     |
    Not Null    | t
    Primary Key | f
    Foreign Key |
    -[ RECORD 2 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | name
    Type        | varchar(41)
    Size        | 41
    Default     | (MapLookup(mountains_hybrid.__raw__, 'name'))::varchar(41)
    Not Null    | f
    Primary Key | f
    Foreign Key |
    -[ RECORD 3 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | hike_safety
    Type        | float
    Size        | 8
    Default     | (MapLookup(mountains_hybrid.__raw__, 'hike_safety'))::float
    Not Null    | f
    Primary Key | f
    Foreign Key |
    -[ RECORD 4 ]------------------------------------------------------------
    Schema      | public
    Table       | mountains_hybrid
    Column      | type
    Type        | varchar(20)
    Size        | 20
    Default     | (MapLookup(mountains_hybrid.__raw__, 'type'))::varchar(20)
    Not Null    | f
    Primary Key | f
    Foreign Key |
    

您现在已经完成有关 Flex 表基本信息、混合 Flex 表和 Flex 函数运用的入门指南。