Sampling data
The goal of data sampling is to take a smaller, more manageable sample of a much larger data set.
The goal of data sampling is to take a smaller, more manageable sample of a much larger data set. With a sample data set, you can produce predictive models or use it to help you tune your database. The following example shows how you can use the TABLESAMPLE
clause to create a sample of your data.
Sampling data from a table
Before you begin the example, load the Machine Learning sample data.Using the baseball
table, create a new table named baseball_sample
containing a 25% sample of baseball
. Remember, TABLESAMPLE does not guarantee that the exact percentage of records defined in the clause are returned.
=> CREATE TABLE baseball_sample AS SELECT * FROM baseball TABLESAMPLE(25);
CREATE TABLE
=> SELECT * FROM baseball_sample;
id | first_name | last_name | dob | team | hr | hits | avg | salary
-----+------------+------------+------------+------------+-----+-------+-------+-------------
4 | Amanda | Turner | 1997-12-22 | Maroon | 58 | 177 | 0.187 | 8047721
20 | Jesse | Cooper | 1983-04-13 | Yellow | 97 | 39 | 0.523 | 4252837
22 | Randy | Peterson | 1980-05-28 | Orange | 14 | 16 | 0.141 | 11827728.1
24 | Carol | Harris | 1991-04-02 | Fuscia | 96 | 12 | 0.456 | 40572253.6
32 | Rose | Morrison | 1977-07-26 | Goldenrod | 27 | 153 | 0.442 | 14510752.49
50 | Helen | Medina | 1987-12-26 | Maroon | 12 | 150 | 0.54 | 32169267.91
70 | Richard | Gilbert | 1983-07-13 | Khaki | 1 | 250 | 0.213 | 40518422.76
81 | Angela | Cole | 1991-08-16 | Violet | 87 | 136 | 0.706 | 42875181.51
82 | Elizabeth | Foster | 1994-04-30 | Indigo | 46 | 163 | 0.481 | 33896975.53
98 | Philip | Gardner | 1992-05-06 | Puce | 39 | 239 | 0.697 | 20967480.67
102 | Ernest | Freeman | 1983-10-05 | Turquoise | 46 | 77 | 0.564 | 21444463.92
.
.
.
(227 rows)
With your sample you can create a predictive model, or tune your database.
See also
- FROM clause (for more information about the
TABLESAMPLE
clause)