Improving query throughput using subclusters
Improving query throughput increases the number of queries your Eon Mode database processes at the same time. You are usually concerned about your database's throughput when your workload consists of many short-running queries. They are often referred to as "dashboard queries." This term describes type of workload you see when a large number of users have web-based dashboard pages open to monitor some sort of status. These dashboards tend to update frequently, using simpler, short-running queries instead of analytics-heavy long running queries.
The best way to improve your database's throughput is to add new subclusters to the database or start any stopped subclusters. Then distribute the client connections among these subclusters using connection load balancing policies. Subclusters independently process queries. By adding more subclusters, you improve your database's parallelism.
For the best performance, make the number of nodes in your subcluster the same as the number of shards in your database. If you choose to have less nodes than the number of shards, make the number of nodes an even divisor of the number of shards. When the number of shards is divisible by the number of nodes, the data in your database is equally divided among the nodes in the subcluster.
The easiest way of adding subclusters is to use the MC:
-
From the MC home page, click the database you want to add subclusters to.
-
Click Manage.
-
Click Add Subcluster.
-
Follow the steps in the wizard to add the subcluster. Normally, the only items you need to fill in are the subcluster name and the number of instances to add to it.
Note
The MC currently does not support creating instances on all platforms. For those platforms where the MC does not support instances, you can manually add subclusters. See Creating subclusters for more information.Distributing clients among the throughput subclusters
To gain benefits from the added subclusters, you must have clients that will execute short-running queries connect to the nodes that the subclusters contain. Queries run only on the subcluster that contains the initiator node (the node that the client is connected to). Use connection load balancing policies to spread the connections across all of the subclusters you created to increase query throughput. See Connection load balancing policies for details.
The following example creates a load balancing policy that spreads client connections across two three-node subclusters named query_pool_a and query_pool_b. This example:
-
Creates network addresses on the six nodes that are in the two subclusters.
-
Creates a load balance group from all the nodes in the two subclusters.
-
Creates the routing rule to redirect all incoming connections to the two subclusters.
=> CREATE NETWORK ADDRESS node04 ON v_verticadb_node0004 WITH '203.0.113.1';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node05 ON v_verticadb_node0005 WITH '203.0.113.2';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node06 ON v_verticadb_node0006 WITH '203.0.113.3';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node07 ON v_verticadb_node0007 WITH '203.0.113.4';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node08 ON v_verticadb_node0008 WITH '203.0.113.5';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node09 ON v_verticadb_node0009 WITH '203.0.113.6';
CREATE NETWORK ADDRESS
=> CREATE LOAD BALANCE GROUP query_subclusters WITH SUBCLUSTER query_pool_a,
query_pool_b FILTER '0.0.0.0/0';
CREATE LOAD BALANCE GROUP
=> CREATE ROUTING RULE query_clients ROUTE '0.0.0.0/0' TO query_subclusters;
CREATE ROUTING RULE
Important
In cloud environments where clients will be connecting from outside the private network, use the external IP address for each node when creating the network addresses. Otherwise, external clients will not be able to connect to the nodes.
If you have a mix of internal and external clients, set up two network addresses for each node and add them to two separate load balancing groups: one for internal clients and another for external. Then create two routing rules one that routes the internal clients to the internal group, and another that routes the external clients to the external group. Make the routing rule for internal clients only apply to the virtual private networks where your internal clients will connect from (for example, 10.0.0.0/8). The routing rule for the external clients can use the 0.0.0.0/0 CIDR range (all IP addresses) as the incoming connection address range. The rules will work correctly together because the more-specific internal client routing rule takes precedence over the less-restrictive external client rule.
After creating the policy, any client that opts into load balancing is redirected to one of the nodes in the two subclusters. For example, when you connect to node 1 in the cluster (with the IP address 203.0.113.1) using vsql with the -C flag, you see output similar to this:
$ vsql -h 203.0.113.1 -U dbadmin -w mypassword -C
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, protocol: TLSv1.2)
INFO: Connected using a load-balanced connection.
INFO: Connected to 203.0.113.7 at port 5433.
=>
Connection load balancing policies take into account nodes that are stopped when picking a node to handle a client connection. If you shut down one or more subclusters to save money during low-demand periods, you do not need to adjust your load balancing policy as long as some of the nodes are still up.