Welcome to Getting Started. This guide walks you through the process of configuring a Vertica Analytics Platform database and running typical queries.
For short tutorial on how to install Vertica, create a database, and load data, see the Quickstart guide.
1 - Using this guide
This guide shows how to set up a Vertica database and run simple queries that perform common database tasks.
This guide shows how to set up a Vertica database and run simple queries that perform common database tasks.
Who should use this guide?
This guide targets anyone who wants to learn how to create and run a Vertica database. This guide requires no special knowledge at this point, although a rudimentary knowledge of basic SQL commands is useful when you begin to run queries.
For short tutorial on how to install Vertica, create a database, and load data, see the Quickstart guide.
What you need
The examples in this guide require one of the following:
Vertica installed on one host or a cluster of hosts. Vertica recommends a minimum of three hosts in the cluster.
You access your database with an SSH client or the terminal utility in your Linux console, such as vsql. Throughout this guide, you use the following user interfaces:
Copy the installation file to your home directory. The example shows an rpm file for CentOS/RHEL, but you may have a deb file for Debian.
$ scp vertica-10.1.0.x86_64.RHEL6.rpm /~
Identify the IP address of the current node.
$ ipconfig -a
If the ipconfig command is not found in your path, you can try running it directly using the paths /sbin/ipconfig or /usr/sbin/ipconfig. If neither of those work, use the ip command:
$ ip a
Note
The previous commands can return multiple addresses. For example, if your system is configured to use both IPv4 and IPv6 addressing, the commands will list two addresses, one for each address family. In this case, you must determine which address you want to use.
Ensure your packages are up to date. Run the command based on your distribution.
On CentOS and RedHat:
$ sudo yum update -y
On openSUSE:
$ sudo zypper up
On Debian and Ubuntu:
$ sudo apt-get update && sudo apt-get upgrade
Set swappiness to 0 (recommended).
$ sudo systemctl vm.swappiness=0
Verify that SELinux is running in permissive mode or is disabled.
The admintools utility included in the installation provides a number of administrative functions.
The admintools utility included in the installation provides a number of administrative functions. The following steps show how to create a database and users with this utility.
View the status of your cluster. It should return an empty table.
$ admintools -t view_cluster
DB | Host | State
----+------+-------
Create a database called "vdb" in the home directory with the password "vertica". This command also sets the plaintext password "vertica" for both the database and dbadmin.
Vertica offers several solutions for loading files with structured and unstructred data, and from several formats.
Vertica offers several solutions for loading files with structured and unstructred data, and from several formats.
Creating a sample data file
Create a sample CSV file called cities.csv with the following contents and save it to /home/dbadmin/cities.csv.
City,State,Zip,Population
Boston,MA,02108,694583
Chicago,IL,60601,2705994
Seattle,WA,98101,744955
Dallas,TX,75201,1345047
New York,NY,10001,8398748
Loading structured data from a file
Run vsql.
$ vsql
Create the cities table.
=> CREATE TABLE cities (
city varchar(20),
state char(2),
zip int,
population int
);
Use the COPY statement to load the data from the cities.csv file. The following command logs exceptions and rejections in the home directory.
=> COPY cities FROM LOCAL '/home/dbadmin/cities.csv' DELIMITER ',' NULL '' EXCEPTIONS '/home/dbadmin/cities_exceptions.log'
REJECTED DATA '/home/dbadmin/cities_rejections.log';
Review the rejections log for what data was excluded. Here, the header was excluded.
Review the exceptions for details on the error. In this case, the header failed Vertica's integer data type verification.
$ cat /home/dbadmin/cities_exceptions.log
COPY: Input record 1 has been rejected (Invalid integer format 'Zip' for column 3 (zip)).
Please see /home/dbadmin/cities_rejections.log, record 1 for the rejected record. This record was record 1 from cities.csv
To fix this, add SKIP 1 to the original COPY statement. This excludes the first row.
=> COPY cities FROM LOCAL '/home/dbadmin/cities.csv' DELIMITER, 'NULL'
EXCEPTIONS '/home/dbadmin/cities_exceptions.log'
REJECTED DATA '/home/dbadmin/cities_rejections.log' SKIP 1;
Loading unstructured data with flex tables
To load data from another source, Vertica uses Flex tables. Flex tables simplify data loading by allowing you to load unstructured or "semi-structured" data without having to create a schema or column definitions.
Supported formats include:
Avro Data
CEF
CSV
Delimited
JSON
Create a table called cities_flex. Notice how it does not include column names or data types.
=> CREATE FLEXIBLE TABLE cities_flex();
Load the CSV file into the table.
=> COPY cities_flex FROM '/source/cities.csv' PARSER FDELIMITEDPARSER (delimiter=',');
Query the cities_flex table, specifying the column names from the original CSV file.
=> SELECT city, state FROM cities_flex;
2.5 - Monitoring the database
This page includes a collection of general-purpose SQL statements useful for monitoring your database.
This page includes a collection of general-purpose SQL statements useful for monitoring your database.
Check disk space
Check disk space used by tables.
=> SELECT projection_schema, anchor_table_name, to_char(sum(used_bytes)/1024/1024/1024,'999,999.99')
as disk_space_used_gb FROM
projection_storage
GROUP by projection_schema, anchor_table_name ORDER by
disk_space_used_gb desc limit 50;
Check total disk space used.
=> SELECT to_char(sum(used_bytes)/1024/1024/1024,'999,999.99') AS gb FROM projection_storage;
Check the amount of free disk space.
=> SELECT to_char(sum(disk_space_free_mb)/1024,'999,999,999') AS
disk_space_free_gb, to_char(sum(disk_space_used_mb)/1024,'999,999,999') AS
disk_space_used_gb FROM disk_storage;
Adjust data types
Change the Zip and Population columns from VARCHAR to INT.
=> UPDATE cities_flex_keys set data_type_guess='int' WHERE key_name='Zip';
=> UPDATE cities_flex_keys set data_type_guess='int' WHERE key_name='Population';
=> COMMIT;
Refresh the cities_flex_view with the new data types
=> SELECT build_flextable_view('cities_flex');
Materialize the flex table
Materialize the flex table and all columns into a persistent Vertica table.
=> CREATE TABLE cities AS SELECT * from cities_flex_view;
View user and role information
View user information.
=> SELECT user_name, is_super_user, resource_pool, memory_cap_kb, temp_space_cap_kb, run_time_cap FROM users;
Identify users.
=> SELECT * FROM user_sessions;
View queries by user.
=> SELECT * FROM query_profiles WHERE user_name ILIKE '%dbadmin%';
View roles.
=> SELECT * FROM roles;
View database information
View resource pool assignments.
=> SELECT user_name, resource_pool FROM users;
View table information.
=> SELECT table_name, is_flextable, is_temp_table, is_system_table, count(*) FROM tables GROUP by 1,2,3,4;
View projection information.
=> SELECT is_segmented, is_aggregate_projection, has_statistics, is_super_projection, count(*) FROM projections GROUP by 1,2,3,4,5;
View update information.
=> SELECT substr(query, 0, instr(query, '')+1) count(*) FROM (SELECT transaction_id, statement_id, upper(query::varchar(30000)) as query FROM query_profiles
WHERE regexp_like(query,'^\s*update\s','i')) sq GROUP BY 1 ORDER BY 1;
View active events.
=> SELECT * FROM active_events WHERE event_problem_description NOT ILIKE '%state to UP';
View backups.
=> SELECT * FROM database_backups;
View disk storage.
=> SELECT node_name, storage_path, storage_usage, storage_status, disk_space_free_percent FROM disk_storage;
View long-running queries
=> SELECT query_duration_us/1000000/60 AS query_duration_mins, table_name, user_name, processed_row_count AS rows_processed, substr(query,0,70) FROM query_profiles
ORDER BY query_duration_us DESCLIMIT 250;
View sizes and counts of Read Optimized Store (ROS) containers.
=> SELECT node_name, projection_name, sum(ros_count), sum(ros_used_bytes) FROM projection_storage GROUP BY 1,2 HAVING sum(ros_count) >= 50
ORDER BY 3 DESC LIMIT 250;
View license information
View license consumption.
=> SELECT GET_COMPLIANCE_STATUS();
View how the database complies with your license.
=> SELECT AUDIT('');
Audit the database to check if it complies with raw storage allowance of your license.
=> SELECT AUDIT_LICENSE_SIZE;
Compare storage size of database the database and your license.
=> SELECT /*+(license_utilization)*/
audit_start_timestamp,
database_size_bytes / (1024^3) AS database_size_gb,
license_size_bytes / (1024^3) AS license_size_gb, usage_percent
FROM v_catalog.license_audits ORDER BYaudit_start_timestamp DESC LIMIT 30;
2.6 - Connecting clients
Vertica supports several third-party clients.
Vertica supports several third-party clients. A list of Vertica client drivers can be found here.
The Vertica Community Edition (CE) is a free, limited license that Vertica provides users so that they can get a hands-on introduction to the platform.
The Vertica Community Edition (CE) is a free, limited license that Vertica provides users so that they can get a hands-on introduction to the platform. It allows you to deploy up to three nodes using a maximum of 1TB of data.
As part of the CE license, you agree to the collection of some anonymous, non-identifying usage data. This data lets Vertica understand how customers use the product, and helps guide the development of new features. None of your personal data is collected. For details on what is collected, see the Community Edition End User License Agreement.
Vertica provides two options to use the Community Edition:
CE container image. Container images require a container engine such as Docker Desktop. For installation details, see the official Docker documentation.
Vertica Community Edition Virtual Machine (Vertica CE VM), which is available for download on the Vertica website.
CE container image
The CE image is a single-node, lightweight alternative to the Vertica CE VM. Vertica provides two options to get the CE container image:
Vertica provides tools to perform administrative tasks quickly and easily:.
Vertica provides tools to perform administrative tasks quickly and easily:
Management Console (MC) provides a unified view of your Vertica cluster through a browser connection.
Administration Tools provides a simple graphical user interface for you to perform certain tasks such as starting and stopping a database, running Database Designer, and more.
The following sections provide detailed information about both tools.
Management Console
Management Console (MC) is the Vertica in-browser monitoring and management tool. Its graphical user interface provides a unified view of your Vertica database operations. Through user-friendly, step-by-step screens, you can create, configure, manage, and monitor your Vertica databases and their associated clusters. You can use MC to operate your Vertica database in Eon Mode or in Enterprise Mode. You can use MC to provision and deploy a Vertica Eon Mode database.
If possible, always run the Administration Tools using the database administrator account (dbadmin) on the administration host.
When you run Administration Tools, the Main Menu dialog box appears with a dark blue background and a title on top. The screen captures used in this documentation set are cropped down to the dialog box itself, as shown in the following screenshot.
The Administration Tools interface responds to mouse clicks in some terminal windows, but it might respond only to keystrokes. The following table is a quick reference to keystroke usage in the Administration Tools interface:
Key
Action
Return
Run selected command.
Tab
Cycle between OK, Cancel, Help, and menu.
Up/Down Arrow
Move cursor up and down in menu, window, or help file.
The first time you log in as the database administrator and run the Administration Tools, complete the following steps:
Accept the end-user license agreement (EULA) to proceed.
A window displays, requesting the location of the license key file you downloaded from the OpenText website. The default path is /tmp/vlicense.dat.
Enter the absolute path to your license key and select OK.
To return to the command line, select Exit and click OK.
5 - Introducing the VMart example database
Vertica ships with a sample multi-schema database called the VMart Example Database, which represents a database that might be used by a large supermarket (VMart) to access information about its products, customers, employees, and online and physical stores.
Vertica ships with a sample multi-schema database called the VMart Example Database, which represents a database that might be used by a large supermarket (VMart) to access information about its products, customers, employees, and online and physical stores. Using this example, you can create, run, optimize, and test a multi-schema database.
The VMart database contains the following schema:
public (automatically created in any newly created Vertica database)
store
online_Sales
VMart database location and scripts
If you installed Vertica from the RPM package, the VMart schema is installed in the /opt/vertica/examples/VMart_Schema directory. This folder contains the following script files that you can use to get started quickly. Use the scripts as templates for your own applications.
Script/file name
Description
vmart_count_data.sql
SQL script that counts rows of all example database tables, which you can use to verify load.
vmart_define_schema.sql
SQL script that defines the logical schema for each table and referential integrity constraints.
vmart_gen.cpp
Data generator source code (C++).
vmart_gen
Data generator executable file.
vmart_load_data.sql
SQL script that loads the generated sample data to the corresponding tables using COPY.
vmart_ queries.sql
SQL script that contains concatenated sample queries for use as a training set for the Database Designer.
vmart_query_##.sql
SQL scripts that contain individual queries; for example, vmart_query_01 through vmart_query_09.sql
vmart_schema_drop.sql
SQL script that drops all example database tables.
For more information about the schema, tables, and queries included with the VMart example database, see the Appendix.
6 - Installing and connecting to the VMart example database
Follow the steps in this section to create the fully functioning, multi-schema VMart example database to run sample queries.
Follow the steps in this section to create the fully functioning, multi-schema VMart example database to run sample queries. The number of example databases you create within a single Vertica installation is limited only by the disk space available on your system. However, Vertica strongly recommends that you start only one example database at a time to avoid unpredictable results.
Vertica provides two options to install the example database:
Quick Installation Using a Script: This option lets you create the example database and start using it immediately. Use this method to bypass the schema and table creation processes and start querying immediately.
Advanced Installation. The advance option is an advanced-but-simple example database installation using the Administration Tools interface. Use this method to better understand the database creation process and practice creating a schema, creating tables, and loading data.
Note
Both installation methods create a database named VMart. If you try both installation methods, you need to drop the VMart database you created (see Restoring the Status of Your Host) or create the subsequent database with a new name. However, Vertica strongly recommends that you start only one example database at a time to avoid unpredictable results
This tutorial uses Vertica-provided queries, but if you create your own design and use your own queries, you can follow the same set of procedures.
6.1 - Quick installation using a script
The script you need to perform a quick installation is located in /opt/vertica/sbin and is called install_example.
The script you need to perform a quick installation is located in /opt/vertica/sbin and is called install_example. This script creates a database on the default port (5433), generates data, creates the schema and a default superprojection, and loads the data. The folder also contains a delete_example script, which stops and drops the database.
In a terminal window, log in as the database administrator.
$ su dbadmin
Password: (your password)
Change to the /examples directory.
$ cd /opt/vertica/examples
Run the install script:
$ /opt/vertica/sbin/install_example VMart
After installation, you should see the following:
[dbadmin@localhost examples]$ /opt/vertica/sbin/install_example VMart
Installing VMart example example database
Mon Jul 22 06:57:40 PDT 2013
Creating Database
Completed
Generating Data. This may take a few minutes.
Completed
Creating schema
Completed
Loading 5 million rows of data. Please stand by.
Completed
Removing generated data files
Example data
The example database log files, ExampleInstall.txt and ExampleDelete.txt, are written to /opt/vertica/examples/log.
To perform an advanced-but-simple installation, set up the VMart example database environment and then create the database using the Administration Tools or Management Console.
To perform an advanced-but-simple installation, set up the VMart example database environment and then create the database using the Administration Tools or Management Console.
Note
If you installed the VMart database using the quick installation method, you cannot complete the following steps because the database has already been created.
To try the advanced installation, drop the example database (see Restoring the Status of Your Host on this guide) and perform the advanced Installation, or create a new example database with a different name. However, Vertica strongly recommends that you install only one example database at a time to avoid unpredictable results.
The advanced installation requires the following steps:
6.2.1 - Step 1: setting up the example environment
Stop all databases running on the same host on which you plan to install your example database.
Stop all databases running on the same host on which you plan to install your example database.
If you are unsure if other databases are running, run the Administration Tools and select View Cluster State. The State column should show DOWN values on pre-existing databases.
If databases are running, click Stop Database in the Main Menu of the Administration Tools interface and click OK.
In a terminal window, log in as the database administrator:
$ su dbadmin
Password:
Change to the /VMart_Schema directory.
$ cd /opt/vertica/examples/VMart_Schema
Do not change directories while following this tutorial. Some steps depend on being in a specific directory.
Run the sample data generator.
$ ./vmart_gen
Let the program run with the default parameters, which you can review in the README file.
To create the example database: use the Administration Tools or Management Console, as described in this section.
To create the example database: use the Administration Tools or Management Console, as described in this section.
Creating the example database using the administration tools
In this procedure, you create the example database using the Administration Tools. To use the Management Console, go to the next section.
Note
If you have not used Administration Tools before, see Vertica interfaces.
Run the Administration Tools.
$ /opt/vertica/bin/admintools
or simply type admintools
From the Administration Tools Main Menu, click Configuration Menu and click OK.
Click Create Database and click OK.
Name the database VMart and click OK.
Click OK to bypass the password and click Yes to confirm.
There is no need for a database administrator password in this tutorial. When you create a production database, however, always specify an administrator password. Otherwise, the database is permanently set to trust authentication (no passwords).
Select the hosts you want to include from your Vertica cluster and click OK.
This example creates the database on a one-host cluster. Vertica recommends a minimum of three hosts in the cluster. If you are using the Vertica Community Edition, you are limited to three nodes.
Click OK to select the default paths for the data and catalog directories.
Catalog and data paths must contain only alphanumeric characters and cannot have leading space characters. Failure to comply with these restrictions could result in database creation failure.
When you create a production database, you’ll likely specify other locations than the default. See Prepare Disk Storage Locations in the Administrator’s Guide for more information.
Since this tutorial uses a one-host cluster, a K-safety warning appears. Click OK.
Click Yes to create the database.
During database creation, Vertica automatically creates a set of node definitions based on the database name and the names of the hosts you selected and returns a success message.
Click OK to close the Database VMart created successfully message.
Creating the example database using Management Console
In this procedure, you create the example database using Management Console. To use the Administration Tools, follow the steps in the preceding section.
Note
To use Management Console, the console should already be installed and you should be familiar with its concepts and layout. For details, see Management Console.
Connect to Management Console and log in.
On the Home page, click Infrastructure to go to the Databases and Clusters page.
Click to select the appropriate existing cluster and click Create Database.
Follow the on-screen wizard, which prompts you to provide the following information:
Database name, which must be between 3–25 characters, starting with a letter, and followed by any combination of letters, numbers, or underscores.
(Optional) database administrator password for the database you want to create and connect to.
IP address of a node in your database cluster, typically the IP address of the administration host.
Click Next.
6.2.3 - Step 3: connecting to the database
Regardless of the installation method you used, follow these steps to connect to the database.
Regardless of the installation method you used, follow these steps to connect to the database.
As
dbadmin, run the Administration Tools.
$ /opt/vertica/bin/admintools
or simply type admintools.
If you are already in the Administration Tools, navigate to the Main Menu page.
Select Connect to Database, click OK.
To configure and load data into the VMart database, complete the following steps:
If you installed the VMart database using the Quick Installation method, the schema, tables, and data are already defined. You can choose to drop the example database (see Restoring the Status of Your Host in this guide) and perform the Advanced Installation, or continue straight to Querying Your Data in this guide.
6.2.4 - Step 4: defining the database schema
The VMart database installs with sample scripts with SQL commands that are intended to represent queries that might be used in a real business.
The VMart database installs with sample scripts with SQL commands that are intended to represent queries that might be used in a real business. The vmart_define_schema.sql script runs a script that defines the VMart schema and creates tables. You must run this script before you load data into the VMart database.
This script performs the following tasks:
Defines two schemas in the VMart database schema: online_sales and store.
Now that you have created the schemas and tables, you can load data into a table by running the vmart_load_data.sql script.
Now that you have created the schemas and tables, you can load data into a table by running the vmart_load_data.sql script. This script loads data from the 15 .tbl text files in /opt/vertica/examples/VMart_Schema into the tables that vmart_design_schema.sql created.
It might take several minutes to load the data on a typical hardware cluster. Check the load status by monitoring the vertica.log file, as described in Monitoring Log Files in the Administrator’s Guide.
The VMart database installs with sample scripts that contain SQL commands that represent queries that might be used in a real business.
The VMart database installs with sample scripts that contain SQL commands that represent queries that might be used in a real business. Use basic SQL commands to query the database, or try out the following command. Once you’re comfortable running the example queries, you might want to write your own.
Note
The data that your queries return might differ from the example output shown in this guide because the sample data generator is random.
Type the following SQL command to return the values for five products with the lowest fat content in the Dairy department. The command selects the fat content from Dairy department products in the product_dimension table in the public schema, orders them from low to high and limits the output to the first five (the five lowest fat contents).
VMart => SELECT fat_content
FROM ( SELECT DISTINCT fat_content
FROM product_dimension
WHERE department_description
IN ('Dairy') ) AS food
ORDER BY fat_content
LIMIT 5;
Your results will be similar to the following:
fat_content
-------------
80
81
82
83
84
(5 rows)
The preceding example is from the vmart_query_01.sql file. You can execute more sample queries using the scripts that installed with the VMart database or write your own. For a list of the sample queries supplied with Vertica, see Appendix: VMart example database schema, tables, and scripts.
8 - Backing up and restoring the database
Vertica supplies a comprehensive utility, vbr, that lets you back up and restore a full database, as well as create backups of specific schema or tables.
Important
Inadequate security on backups can compromise overall database security. Be sure to secure backup locations and strictly limit access to backups only to users who already have permissions to access all database data.
Vertica supplies a comprehensive utility, vbr, that lets you back up and restore a full database, as well as create backups of specific schema or tables. You should back up your database regularly and before major or destructive operations.
All vbr operations rely on a configuration file that describes your database, backup locations, and other parameters. Typically you use the same configuration file for both the backup and restore operations. To create your first configuration file, copy one of the sample files for backup listed in Sample vbr configuration files. Edit the copy to specify a snapshot (backup) name, your database details, and where to back up. The comments in the sample file guide you.
By default, there is no screen output other than the progress bar.
You can restore the entire database or selected schemas and tables. You can also use vbr to replicate data from one database to another or to copy an entire cluster. For more information about vbr, see Backing up and restoring the database.
9 - Using Database Designer to create a comprehensive design
Vertica Database Designer:.
Vertica Database Designer:
Analyzes your logical schema, sample data, and, optionally, your sample queries.
Creates a physical schema design (a set of projections) that can be deployed automatically or manually.
Does not require specialized database knowledge.
Can be run and rerun any time for additional optimization without stopping the database.
Uses strategies to provide optimal query performance and data compression.
Use Database Designer to create a comprehensive design, which allows you to create projections for all tables in your database.
You can also use Database Designer to create an incremental design, which creates projections for all tables referenced in the queries you supply.
You can create a comprehensive design with Database Designer using Management Console or through Administration Tools. You can also choose to run Database Designer programmatically.
9.1 - Running Database Designer with Management Console
In this tutorial, you'll create a comprehensive design with Database Designer through the Management Console interface.
In this tutorial, you'll create a comprehensive design with Database Designer through the Management Console interface. If, in the future, you have a query that you want to optimize, you can create an enhanced (incremental) design with additional projections. You can tune these projections specifically for the query you provide.
Note
To run Database Designer outside Administration Tools, you must be a dbadmin user. If you are not a dbadmin user, you must have the DBDUSER role assigned to you and own the tables for which you are designing projections. For details, see Database Designer access requirements.
You can choose to create the design manually or use the Management Console wizard, as described below.
Important
Set your browser so it does not cache pages. If a browser caches pages, you might be unable to see the new design added.
Follow these steps to to create a comprehensive design with the Management Console wizard:
Log in to Management Console.
Verify that your database is up and running.
Choose the database for which you want to create the design. You can find the database under the Recent Databases section or by clicking Existing Infrastructure to reach the Databases and Clusters page.
The database overview page opens.
At the bottom of the screen, click the Design button.
In the New Design dialog box, enter the design name.
Click Wizard to continue.
Create an initial design. For Design Type, select Comprehensive and click Next.
In the Optimization Objective window, select Balance Load and Performance to create a design that is balanced between database size and query performance. Click Next.
Select the schemas. Because the VMart design is a multi-schema database, select all three schemas (public, store, and online_sales) for your design in the Select Sample Data window. Click Next.
If you include a schema that contains tables without data, the design could be suboptimal. You can choose to continue, but Vertica recommends that you deselect the schemas that contain empty tables before you proceed.
Choose the K-safety value for your design. The K-Safety value determines the number of buddy projections you want Database Designer to create.
Choose Analyze Correlations Mode. Analyze Correlations Mode determines if Database Designer analyzes and considers column correlations when creating the design.
Ignore: When creating a design, ignore any column correlations in the specified tables.
Consider existing: Consider the existing correlations in the tables when creating the design. If you set the mode to 1, and there are no existing correlations, Database Designer does not consider correlations.
Analyze missing: Analyze column correlations on tables where the correlation analysis was not previously performed. When creating the design, consider all column correlations (new and existing).
Analyze all: Analyze all column correlations in the tables and consider them when creating the design. Even if correlations exist for a table, reanalyze the table for correlations.
Click Next.
Submit query files to Database Designer in one of two ways:
Supply your own query files by selecting the Browse button.
Click Use Query Repository, which submits recently executed queries from the QUERY_REQUESTS system table.
Click Next.
In the Execution Options window, select any of the following options:
Analyze statistics: Select this option to run statistics automatically after design deployment, so Database Designer can make better decisions for its proposed design.
Auto-build: Select this option to run Database Designer as soon as you complete the wizard. This option only builds the proposed design.
Auto-deploy: Select this option for auto-build designs that you want to deploy automatically.
Click Submit Design. The Database Designer page opens:
If you chose to automatically deploy your design, Database Designer executes in the background.
If you did not select the Auto-build or Auto-deploy options, you can click Build Design or Deploy Design on the Database Designer page.
In the My Designs pane, view the status of your design:
When the deployment completes, the My Design pane shows Design Deployed.
The event history window shows the details of the design build and deployment.
9.2 - Running Database Designer with administration tools
In this procedure, you create a comprehensive design with Database Designer using the Administration Tools interface.
In this procedure, you create a comprehensive design with Database Designer using the Administration Tools interface. If, in the future, you have a query that you want to optimize, you can create an enhanced (incremental) design with additional projections. You can tune these projections specifically for the query you provide. See Incremental Design for more information.
Follow these steps to create the comprehensive design using Database Designer in Administration Tools:
If you are not in Administration Tools, exit the vsql session and access Administration Tools:
Type \q to exit vsql.
Type admintools to access the Administration Tools Main Menu.
Start the database for which you want to create a design.
From the Main Menu, click Configuration Menu and then click OK.
From the Configuration Menu, click Run Database Designer and then click OK.
When the Select a database for design dialog box opens, select VMart and then click OK.
If you are prompted to enter the password for the database, click OK to bypass the message. Because no password was assigned when you installed the VMart database, you do not need to enter one now.
Click OK to accept the default directory for storing Database Designer output and log files.
In the Database Designer window, enter a name for the design, for example, vmart_design, and click OK. Design names can contain only alphanumeric characters or underscores. No other special characters are allowed.
Create a complete initial design. In the Design Type window, click Comprehensive and click OK.
Select the schemas. Because the VMart design is a multi-schema database, you can select all three schemas (online_sales, public, and store) for your design. Click OK.
If you include a schema that contains tables without data, the Administration Tools notifies you that designing for tables without data could be suboptimal. You can choose to continue, but Vertica recommends that you deselect the schemas that contain empty tables before you proceed.
In the Design Options window, accept all three options and click OK.
The three options are:
Optimize with queries: Supplying the Database Designer with queries is especially important if you want to optimize the database design for query performance. Vertica recommends that you limit the design input to 100 queries.
Update statistics: Accurate statistics help the Database Designer choose the best strategy for data compression. If you select this option, the database statistics are updated to maximize design quality.
Deploy design: The new design deploys automatically. During deployment, new projections are added, some existing projections retained, and any necessary existing projections removed. Any new projections are refreshed to populate them with data.
Because you selected the Optimize with queries option, you must enter the full path to the file containing the queries that will be run on your database. In this example, it is:
The queries in the query file must be delimited with semicolons (;). The last query must end with a semicolon (;).
Choose the K-safety value you want and click OK. The design K-Safety determines the number of buddy projections you want Database Designer to create.
If you create a comprehensive design on a single node, you are not prompted to enter a K-safety value.
In the Optimization Objective window, select Balanced query/load performance to create a design that is balanced between database size and query performance. Click OK.
When the informational message displays, click Proceed.
Database Designer automatically performs these actions:
Sets up the design session.
Examines table data.
Loads queries from the query file you provided (in this example, /opt/vertica/examples/VMart_Schema/vmart_queries.sql).
Creates the design.
Deploys the design or saves a SQL file containing the commands to create the design, based on your selections in the Desgin Options window.
Depending on system resources, the design process could take several minutes. You should allow this process to complete uninterrupted. If you must cancel the session, use Ctrl+C.
When Database Designer finishes, press Enter to return to the Administration Tools menu. Examine the steps taken to create the design. The files are in the directory you specified to store the output and log files. In this example, that directory is /opt/vertica/examples/VMart_Schema. For more information about the script files, see About Database Designer.
For additional information about managing your designs, see Creating a database design in the Administrator’s Guide.
10 - Restoring the status of your host
When you finish the tutorial, you can restore your host machines to their original state.
When you finish the tutorial, you can restore your host machines to their original state. Use the following instructions to clean up your host and start over from scratch.
Stopping and dropping the database
Follow these steps to stop and/or drop your database. A database must be stopped before it can be dropped.
If connected to the database, disconnect by typing \q.
In the Administration Tools Main Menu dialog box, click Stop Database and click OK.
In the Select database to stop window, select the database you want to stop and click OK.
After stopping the database, click Configuration Menu and click OK.
Click Drop Database and click OK.
In the Select database to drop window, select the database you want to drop and click OK.
Click Yes to confirm.
In the next window type yes (lowercase) to confirm and click OK.
Alternatively, use the delete_example script, which stops and drops the database:
If connected to the database, disconnect by typing \q.
In the Administration Tools Main Menu dialog box, select Exit.
Remove any example database directories you created.
11 - Appendix: VMart example database schema, tables, and scripts
This appendix provides detailed information about the VMart example database’s schema, tables, and scripts.
This appendix provides detailed information about the VMart example database’s schema, tables, and scripts.
The VMart example database contains three different schemas:
public
store
online_sales
The term “schema” has several related meanings in Vertica:
In SQL statements, a schema refers to named namespace for a logical schema.
Logical schema refers to a set of tables and constraints.
Physical schema refers to a set of projections.
Tables identifies the three schemas and all the data tables in the VMart database. Each schema contains tables that are created and loaded during database installation. See the schema maps for a list of tables and their contents:
Sample scripts describes the sample scripts that contain SQL commands that represent queries that might be used in a real business using a VMart-like database. Once you’re comfortable running the example queries, you might want to write your own.
11.1 - Tables
The three schemas in the VMart database include the following tables:.
The three schemas in the VMart database include the following tables:
public Schema
store Schema
online_sales Schema
inventory_fact
store_orders_fact
online_sales_fact
customer_dimension
store_sales_fact
call_center_dimension
date_dimension
store_dimension
online_page_dimension
employee_dimension
product_dimension
promotion_dimension
shipping_dimension
vendor_dimension
warehouse_dimension
11.2 - Public schema map
The public schema is a snowflake schema.
The public schema is a snowflake schema. The following graphic illustrates the public schema and its relationships with tables in the online_sales and store schemas.
The subsequent subsections describe database tables.
11.2.1 - inventory_fact
This table contains information about each product in inventory.
This table contains information about each product in inventory.
Column Name
Data Type
NULLs
date_key
INTEGER
No
product_key
INTEGER
No
product_version
INTEGER
No
warehouse_key
INTEGER
No
qty_in_stock
INTEGER
No
11.2.2 - customer_dimension
This table contains information about all the retail chain’s customers.
This table contains information about all the retail chain’s customers.
Column Name
Data Type
NULLs
customer_key
INTEGER
No
customer_type
VARCHAR(16)
Yes
customer_name
VARCHAR(256)
Yes
customer_gender
VARCHAR(8)
Yes
title
VARCHAR(8)
Yes
household_id
INTEGER
Yes
customer_address
VARCHAR(256)
Yes
customer_city
VARCHAR(64)
Yes
customer_state
CHAR(2)
Yes
customer_region
VARCHAR(64)
Yes
marital_status
VARCHAR(32)
Yes
customer_age
INTEGER
Yes
number_of_children
INTEGER
Yes
annual_income
INTEGER
Yes
occupation
VARCHAR(64)
Yes
largest_bill_amount
INTEGER
Yes
store_membership_card
INTEGER
Yes
customer_since
DATE
Yes
deal_stage
VARCHAR(32)
Yes
deal_size
INTEGER
Yes
last_deal_update
DATE
Yes
11.2.3 - date_dimension
This table contains information about dates.
This table contains information about dates. It is generated from a file containing correct date/time data.
Column Name
Data Type
NULLs
date_key
INTEGER
No
date
DATE
Yes
full_date_description
VARCHAR(18)
Yes
day_of_week
VARCHAR(9)
Yes
day_number_in_calendar_month
INTEGER
Yes
day_number_in_calendar_year
INTEGER
Yes
day_number_in_fiscal_month
INTEGER
Yes
day_number_in_fiscal_year
INTEGER
Yes
last_day_in_week_indicator
INTEGER
Yes
last_day_in_month_indicator
INTEGER
Yes
calendar_week_number_in_year
INTEGER
Yes
calendar_month_name
VARCHAR(9)
Yes
calendar_month_number_in_year
INTEGER
Yes
calendar_year_month
CHAR(7)
Yes
calendar_quarter
INTEGER
Yes
calendar_year_quarter
CHAR(7)
Yes
calendar_half_year
INTEGER
Yes
calendar_year
INTEGER
Yes
holiday_indicator
VARCHAR(10)
Yes
weekday_indicator
CHAR(7)
Yes
selling_season
VARCHAR(32)
Yes
11.2.4 - employee_dimension
This table contains information about all the people who work for the retail chain.
This table contains information about all the people who work for the retail chain.
Column Name
Data Type
NULLs
employee_key
INTEGER
No
employee_gender
VARCHAR(8)
Yes
courtesy_title
VARCHAR(8)
Yes
employee_first_name
VARCHAR(64)
Yes
employee_middle_initial
VARCHAR(8)
Yes
employee_last_name
VARCHAR(64)
Yes
employee_age
INTEGER
Yes
hire_date
DATE
Yes
employee_street_address
VARCHAR(256)
Yes
employee_city
VARCHAR(64)
Yes
employee_state
CHAR(2)
Yes
employee_region
CHAR(32)
Yes
job_title
VARCHAR(64)
Yes
reports_to
INTEGER
Yes
salaried_flag
INTEGER
Yes
annual_salary
INTEGER
Yes
hourly_rate
FLOAT
Yes
vacation_days
INTEGER
Yes
11.2.5 - product_dimension
This table describes all products sold by the department store chain.
This table describes all products sold by the department store chain.
Column Name
Data Type
NULLs
product_key
INTEGER
No
product_version
INTEGER
No
product_description
VARCHAR(128)
Yes
sku_number
CHAR(32)
Yes
category_description
CHAR(32)
Yes
department_description
CHAR(32)
Yes
package_type_description
CHAR(32)
Yes
package_size
CHAR(32)
Yes
fat_content
INTEGER
Yes
diet_type
CHAR(32)
Yes
weight
INTEGER
Yes
weight_units_of_measure
CHAR(32)
Yes
shelf_width
INTEGER
Yes
shelf_height
INTEGER
Yes
shelf_depth
INTEGER
Yes
product_price
INTEGER
Yes
product_cost
INTEGER
Yes
lowest_competitor_price
INTEGER
Yes
highest_competitor_price
INTEGER
Yes
average_competitor_price
INTEGER
Yes
discontinued_flag
INTEGER
Yes
11.2.6 - promotion_dimension
This table describes every promotion ever done by the retail chain.
This table describes every promotion ever done by the retail chain.
Column Name
Data Type
NULLs
promotion_key
INTEGER
No
promotion_name
VARCHAR(128)
Yes
price_reduction_type
VARCHAR(32)
Yes
promotion_media_type
VARCHAR(32)
Yes
ad_type
VARCHAR(32)
Yes
display_type
VARCHAR(32)
Yes
coupon_type
VARCHAR(32)
Yes
ad_media_name
VARCHAR(32)
Yes
display_provider
VARCHAR(128)
Yes
promotion_cost
INTEGER
Yes
promotion_begin_date
DATE
Yes
promotion_end_date
DATE
Yes
11.2.7 - shipping_dimension
This table contains information about shipping companies that the retail chain uses.
This table contains information about shipping companies that the retail chain uses.
Column Name
Data Type
NULLs
shipping_key
INTEGER
No
ship_type
CHAR(30)
Yes
ship_mode
CHAR(10)
Yes
ship_carrier
CHAR(20)
Yes
11.2.8 - vendor_dimension
This table contains information about each vendor that provides products sold through the retail chain.
This table contains information about each vendor that provides products sold through the retail chain.
Column Name
Data Type
NULLs
vendor_key
INTEGER
No
vendor_name
VARCHAR(64)
Yes
vendor_address
VARCHAR(64)
Yes
vendor_city
VARCHAR(64)
Yes
vendor_state
CHAR(2)
Yes
vendor_region
VARCHAR(32)
Yes
deal_size
INTEGER
Yes
last_deal_update
DATE
Yes
11.2.9 - warehouse_dimension
This table provides information about each of the chain’s warehouses.
This table provides information about each of the chain’s warehouses.
Column Name
Data Type
NULLs
warehouse_key
INTEGER
No
warehouse_name
VARCHAR(20)
Yes
warehouse_address
VARCHAR(256)
Yes
warehouse_city
VARCHAR(60)
Yes
warehouse_state
CHAR(2)
Yes
warehouse_region
VARCHAR(32)
Yes
11.3 - Store schema map
The store schema is a snowflake schema that contains information about the retail chain’s bricks-and-mortar stores.
The store schema is a snowflake schema that contains information about the retail chain’s bricks-and-mortar stores. The following graphic illustrates the store schema and its relationship with tables in the public schema.
The subsequent subsections describe database tables.
11.3.1 - store_orders_fact
This table contains information about all orders made at the company’s brick-and-mortar stores.
This table contains information about all orders made at the company’s brick-and-mortar stores.
Column Name
Data Type
NULLs
product_key
INTEGER
No
product_version
INTEGER
No
store_key
INTEGER
No
vendor_key
INTEGER
No
employee_key
INTEGER
No
order_number
INTEGER
No
date_ordered
DATE
Yes
date_shipped
DATE
Yes
expected_delivery_date
DATE
Yes
date_delivered
DATE
Yes
quantity_ordered
INTEGER
Yes
quantity_delivered
INTEGER
Yes
shipper_name
VARCHAR(32)
Yes
unit_price
INTEGER
Yes
shipping_cost
INTEGER
Yes
total_order_cost
INTEGER
Yes
quantity_in_stock
INTEGER
Yes
reorder_level
INTEGER
Yes
overstock_ceiling
INTEGER
Yes
11.3.2 - store_sales_fact
This table contains information about all sales made at the company’s brick-and-mortar stores.
This table contains information about all sales made at the company’s brick-and-mortar stores.
Column Name
Data Type
NULLs
date_key
INTEGER
No
product_key
INTEGER
No
product_version
INTEGER
No
store_key
INTEGER
No
promotion_key
INTEGER
No
customer_key
INTEGER
No
employee_key
INTEGER
No
pos_transaction_number
INTEGER
No
sales_quantity
INTEGER
Yes
sales_dollar_amount
INTEGER
Yes
cost_dollar_amount
INTEGER
Yes
gross_profit_dollar_amount
INTEGER
Yes
transaction_type
VARCHAR(16)
Yes
transaction_time
TIME
Yes
tender_type
VARCHAR(8)
Yes
11.3.3 - store_dimension
This table contains information about each brick-and-mortar store within the retail chain.
This table contains information about each brick-and-mortar store within the retail chain.
Column Name
Data Type
NULLs
store_key
INTEGER
No
store_name
VARCHAR(64)
Yes
store_number
INTEGER
Yes
store_address
VARCHAR(256)
Yes
store_city
VARCHAR(64)
Yes
store_state
CHAR(2)
Yes
store_region
VARCHAR(64)
Yes
floor_plan_type
VARCHAR(32)
Yes
photo_processing_type
VARCHAR(32)
Yes
financial_service_type
VARCHAR(32)
Yes
selling_square_footage
INTEGER
Yes
total_square_footage
INTEGER
Yes
first_open_date
DATE
Yes
last_remodel_date
DATE
Yes
number_of_employees
INTEGER
Yes
annual_shrinkage
INTEGER
Yes
foot_traffic
INTEGER
Yes
monthly_rent_cost
INTEGER
Yes
11.4 - online_sales schema map
The online_sales schema is a snowflake schema that contains information about the retail chains.
The online_sales schema is a snowflake schema that contains information about the retail chains. The following graphic illustrates the online_sales schema and its relationship with tables in the public schema.
The subsequent subsections describe database tables.
11.4.1 - online_sales_fact
This table describes all the items purchased through the online store front.
This table describes all the items purchased through the online store front.
Column Name
Data Type
NULLs
sale_date_key
INTEGER
No
ship_date_key
INTEGER
No
product_key
INTEGER
No
product_version
INTEGER
No
customer_key
INTEGER
No
call_center_key
INTEGER
No
online_page_key
INTEGER
No
shipping_key
INTEGER
No
warehouse_key
INTEGER
No
promotion_key
INTEGER
No
pos_transaction_number
INTEGER
No
sales_quantity
INTEGER
Yes
sales_dollar_amount
FLOAT
Yes
ship_dollar_amount
FLOAT
Yes
net_dollar_amount
FLOAT
Yes
cost_dollar_amount
FLOAT
Yes
gross_profit_dollar_amount
FLOAT
Yes
transaction_type
VARCHAR(16)
Yes
11.4.2 - call_center_dimension
This table describes all the chain’s call centers.
This table describes all the chain’s call centers.
Column Name
Data Type
NULLs
call_center_key
INTEGER
No
cc_closed_date
DATE
Yes
cc_open_date
DATE
Yes
cc_date
VARCHAR(50)
Yes
cc_class
VARCHAR(50)
Yes
cc_employees
INTEGER
Yes
cc_hours
CHAR(20)
Yes
cc_manager
VARCHAR(40)
Yes
cc_address
VARCHAR(256)
Yes
cc_city
VARCHAR(64)
Yes
cc_state
CHAR(2)
Yes
cc_region
VARCHAR(64)
Yes
11.4.3 - online_page_dimension
This table describes all the pages in the online store front.
This table describes all the pages in the online store front.
Column Name
Data Type
NULLs
online_page_key
INTEGER
No
start_date
DATE
Yes
end_date
DATE
Yes
page_number
INTEGER
Yes
page_description
VARCHAR(100)
Yes
page_type
VARCHAR(100)
Yes
11.5 - Sample scripts
You can create your own queries, but the VMart example directory includes sample query script files to help you get started quickly.
You can create your own queries, but the VMart example directory includes sample query script files to help you get started quickly.
You can find the following sample scripts at this path /opt/vertica/examples/VMart_Schema.
To run any of the scripts, enter
=> \i <script_name>
Alternatively, type the commands from the script file manually.
Note
The data that your queries return might differ from the example output shown in this guide because the sample data generator is random.
11.5.1 - vmart_query_01.sql
Output.
-- vmart_query_01.sql
-- FROM clause subquery
-- Return the values for five products with the
-- lowest-fat content in the Dairy department
SELECT fat_content
FROM (
SELECT DISTINCT fat_content
FROM product_dimension
WHERE department_description
IN ('Dairy') ) AS food
ORDER BY fat_content
LIMIT 5;
Output
fat_content
-------------
80
81
82
83
84
(5 rows)
11.5.2 - vmart_query_02.sql
Output.
-- vmart_query_02.sql
-- WHERE clause subquery
-- Asks for all orders placed by stores located in Massachusetts
-- and by vendors located elsewhere before March 1, 2003:
SELECT order_number, date_ordered
FROM store.store_orders_fact orders
WHERE orders.store_key IN (
SELECT store_key
FROM store.store_dimension
WHERE store_state = 'MA')
AND orders.vendor_key NOT IN (
SELECT vendor_key
FROM public.vendor_dimension
WHERE vendor_state = 'MA')
AND date_ordered < '2012-03-01';
-- vmart_query_03.sql
-- noncorrelated subquery
-- Requests female and male customers with the maximum
-- annual income from customers
SELECT customer_name, annual_income
FROM public.customer_dimension
WHERE (customer_gender, annual_income) IN (
SELECT customer_gender, MAX(annual_income)
FROM public.customer_dimension
GROUP BY customer_gender);
Output
customer_name | annual_income
------------------+---------------
James M. McNulty | 999979
Emily G. Vogel | 999998
(2 rows)
11.5.4 - vmart_query_04.sql
Output.
-- vmart_query_04.sql
-- IN predicate
-- Find all products supplied by stores in MA
SELECT DISTINCT s.product_key, p.product_description
FROM store.store_sales_fact s, public.product_dimension p
WHERE s.product_key = p.product_key
AND s.product_version = p.product_version AND s.store_key IN (
SELECT store_key
FROM store.store_dimension
WHERE store_state = 'MA')
ORDER BY s.product_key;
-- vmart_query_05.sql
-- EXISTS predicate
-- Get a list of all the orders placed by all stores on
-- January 2, 2003 for the vendors with records in the
-- vendor_dimension table
SELECT store_key, order_number, date_ordered
FROM store.store_orders_fact
WHERE EXISTS (
SELECT 1
FROM public.vendor_dimension
WHERE public.vendor_dimension.vendor_key = store.store_orders_fact.vendor_key)
AND date_ordered = '2012-01-02';
-- vmart_query_06.sql
-- EXISTS predicate
-- Orders placed by the vendor who got the best deal
-- on January 4, 2004
SELECT store_key, order_number, date_ordered
FROM store.store_orders_fact ord, public.vendor_dimension vd
WHERE ord.vendor_key = vd.vendor_key
AND vd.deal_size IN (
SELECT MAX(deal_size)
FROM public.vendor_dimension)
AND date_ordered = '2013-01-04';
-- vmart_query_07.sql
-- Multicolumn subquery
-- Which products have the highest cost,
-- grouped by category and department
SELECT product_description, sku_number, department_description
FROM public.product_dimension
WHERE (category_description, department_description, product_cost) IN (
SELECT category_description, department_description,
MAX(product_cost) FROM product_dimension
GROUP BY category_description, department_description);
-- vmart_query_08.sql
-- between online_sales_fact and online_page_dimension
SELECT page_description, page_type, start_date, end_date
FROM online_sales.online_sales_fact f, online_sales.online_page_dimension d
WHERE f.online_page_key = d.online_page_key
AND page_number IN
(SELECT MAX(page_number)
FROM online_sales.online_page_dimension)
AND page_type = 'monthly' AND start_date = '2012-06-02';
-- vmart_query_09.sql
-- Equi join
-- Joins online_sales_fact table and the call_center_dimension
-- table with the ON clause
SELECT sales_quantity, sales_dollar_amount, transaction_type, cc_name
FROM online_sales.online_sales_fact
INNER JOIN online_sales.call_center_dimension
ON (online_sales.online_sales_fact.call_center_key
= online_sales.call_center_dimension.call_center_key
AND sale_date_key = 156)
ORDER BY sales_dollar_amount DESC;
Output
sales_quantity | sales_dollar_amount | transaction_type | cc_name
----------------+---------------------+------------------+-------------------
7 | 589 | purchase | Central Midwest
8 | 589 | purchase | South Midwest
8 | 589 | purchase | California
1 | 587 | purchase | New England
1 | 586 | purchase | Other
1 | 584 | purchase | New England
4 | 584 | purchase | New England
7 | 581 | purchase | Mid Atlantic
5 | 579 | purchase | North Midwest
8 | 577 | purchase | North Midwest
4 | 577 | purchase | Central Midwest
2 | 575 | purchase | Hawaii/Alaska
4 | 573 | purchase | NY Metro
4 | 572 | purchase | Central Midwest
1 | 570 | purchase | Mid Atlantic
9 | 569 | purchase | Southeastern
1 | 569 | purchase | NY Metro
5 | 567 | purchase | Other
7 | 567 | purchase | Hawaii/Alaska
9 | 567 | purchase | South Midwest
1 | 566 | purchase | New England
...