BALANCE
Returns a view with an equal distribution of the input data based on the response_column.
	Returns a view with an equal distribution of the input data based on the response_column.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
BALANCE ( 'output-view', 'input-relation', 'response-column', 'balance-method'
       [ USING PARAMETERS sampling_ratio=ratio ] )
Arguments
- output-view
- The name of the view where Vertica saves the balanced data from the input relation.
NoteNote: The view that results from this function employs a random function. Its content can differ each time it is used in a query. To make the operations on the view predictable, store it in a regular table.
- input-relation
- The table or view that contains the data the function uses to create a more balanced data set. If the input relation is defined in Hive, use 
SYNC_WITH_HCATALOG_SCHEMAto sync thehcatalogschema, and then run the machine learning function.
- response-column
- Name of the input column that represents the dependent variable, of type VARCHAR or INTEGER.
- balance-method
- Specifies a method to select data from the minority and majority classes, one of the following.
- 
hybrid_sampling: Performs over-sampling and under-sampling on different classes so each class is equally represented.
- 
over_sampling: Over-samples on all classes, with the exception of the most majority class, towards the most majority class's cardinality.
- 
under_sampling: Under-samples on all classes, with the exception of the most minority class, towards the most minority class's cardinality.
- 
weighted_sampling: An alias ofunder_sampling.
 
- 
Parameters
- ratio
- The desired ratio between the majority class and the minority class. This value has no effect when used with balance method hybrid_sampling.Default: 1.0 
Privileges
Non-superusers:
- 
SELECT privileges on the input relation 
- 
CREATE privileges on the output view schema 
Examples
=> CREATE TABLE backyard_bugs (id identity, bug_type int, finder varchar(20));
CREATE TABLE
=> COPY backyard_bugs FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1|Ants
>> 1|Beetles
>> 3|Ladybugs
>> 3|Ants
>> 3|Beetles
>> 3|Caterpillars
>> 2|Ladybugs
>> 3|Ants
>> 3|Beetles
>> 1|Ladybugs
>> 3|Ladybugs
>> \.
=> SELECT bug_type, COUNT(bug_type) FROM backyard_bugs GROUP BY bug_type;
 bug_type | COUNT
----------+-------
        2 |     1
        1 |     3
        3 |     7
(3 rows)
=> SELECT BALANCE('backyard_bugs_balanced', 'backyard_bugs', 'bug_type', 'under_sampling');
         BALANCE
--------------------------
 Finished in 1 iteration
(1 row)
=> SELECT bug_type, COUNT(bug_type) FROM backyard_bugs_balanced GROUP BY bug_type;
----------+-------
        2 |     1
        1 |     2
        3 |     1
(3 rows)