Sampling data

The goal of data sampling is to take a smaller, more manageable sample of a much larger data set.

The goal of data sampling is to take a smaller, more manageable sample of a much larger data set. With a sample data set, you can produce predictive models or use it to help you tune your database. The following example shows how you can use the TABLESAMPLE clause to create a sample of your data.

Sampling data from a table

Before you begin the example, load the Machine Learning sample data.

Using the baseball table, create a new table named baseball_sample containing a 25% sample of baseball. Remember, TABLESAMPLE does not guarantee that the exact percentage of records defined in the clause are returned.

=> CREATE TABLE baseball_sample AS SELECT * FROM baseball TABLESAMPLE(25);
CREATE TABLE
=> SELECT * FROM baseball_sample;
 id  | first_name | last_name  |    dob     |    team    | hr  | hits  |  avg  |   salary
-----+------------+------------+------------+------------+-----+-------+-------+-------------
   4 | Amanda     | Turner     | 1997-12-22 | Maroon     |  58 |   177 | 0.187 |     8047721
  20 | Jesse      | Cooper     | 1983-04-13 | Yellow     |  97 |    39 | 0.523 |     4252837
  22 | Randy      | Peterson   | 1980-05-28 | Orange     |  14 |    16 | 0.141 |  11827728.1
  24 | Carol      | Harris     | 1991-04-02 | Fuscia     |  96 |    12 | 0.456 |  40572253.6
  32 | Rose       | Morrison   | 1977-07-26 | Goldenrod  |  27 |   153 | 0.442 | 14510752.49
  50 | Helen      | Medina     | 1987-12-26 | Maroon     |  12 |   150 |  0.54 | 32169267.91
  70 | Richard    | Gilbert    | 1983-07-13 | Khaki      |   1 |   250 | 0.213 | 40518422.76
  81 | Angela     | Cole       | 1991-08-16 | Violet     |  87 |   136 | 0.706 | 42875181.51
  82 | Elizabeth  | Foster     | 1994-04-30 | Indigo     |  46 |   163 | 0.481 | 33896975.53
  98 | Philip     | Gardner    | 1992-05-06 | Puce       |  39 |   239 | 0.697 | 20967480.67
 102 | Ernest     | Freeman    | 1983-10-05 | Turquoise  |  46 |    77 | 0.564 | 21444463.92
.
.
.
(227 rows)

With your sample you can create a predictive model, or tune your database.

See also

  • FROM clause (for more information about the TABLESAMPLE clause)