事件系列模式匹配

搜索事件模式时,您可以使用 SQL MATCH 子句 语法来筛选大量历史数据。您可以将模式指定为正则表达式,然后可以在输入事件序列内搜索该模式。MATCH 提供了分析数据分区和排序的子句,以及对连续行集执行的模式匹配。

如果您想在点击流分析中根据用户的 Web 浏览行为(页面点击)确定用户的操作,那么模式匹配就尤其有用。典型的在线点击流漏斗是:

公司主页 -> 产品主页 -> 搜索 -> 结果 -> 在线购买

您可以使用此单击流漏斗,在用户的 Web 点击序列中搜索匹配并标识该用户:

  • 登录公司主页

  • 导航至产品页面

  • 运行查询

  • 单击搜索结果中的链接

  • 购买

单击流漏斗架构

此主题中的示例使用了此点击流漏斗以及以下 clickstream_log 表架构:

=> CREATE TABLE clickstream_log (
  uid INT,             --user ID
  sid INT,             --browsing session ID, produced by previous sessionization computation
  ts TIME,             --timestamp that occurred during the user's page visit
  refURL VARCHAR(20),  --URL of the page referencing PageURL
  pageURL VARCHAR(20), --URL of the page being visited
  action CHAR(1)       --action the user took after visiting the page ('P' = Purchase, 'V' = View)
);

INSERT INTO clickstream_log VALUES (1,100,'12:00','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:01','website2.com/home','website2.com/floby', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:02','website2.com/floby','website2.com/shamwow', 'V');
INSERT INTO clickstream_log values (1,100,'12:03','website2.com/shamwow','website2.com/buy', 'P');
INSERT INTO clickstream_log values (2,100,'12:10','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log values (2,100,'12:11','website2.com/home','website2.com/forks', 'V');
INSERT INTO clickstream_log values (2,100,'12:13','website2.com/forks','website2.com/buy', 'P');
COMMIT;

以下为 clickstream_log 表的输出:

=> SELECT * FROM clickstream_log;
 uid | sid |    ts    |        refURL        |       pageURL        | action
-----+-----+----------+----------------------+----------------------+--------
   1 | 100 | 12:00:00 | website1.com         | website2.com/home    | V
   1 | 100 | 12:01:00 | website2.com/home    | website2.com/floby   | V
   1 | 100 | 12:02:00 | website2.com/floby   | website2.com/shamwow | V
   1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy     | P
   2 | 100 | 12:10:00 | website1.com         | website2.com/home    | V
   2 | 100 | 12:11:00 | website2.com/home    | website2.com/forks   | V
   2 | 100 | 12:13:00 | website2.com/forks   | website2.com/buy     | P
(7 rows)

示例

此示例包括 Vertica MATCH 子句函数,以便分析用户在 website2.com 上的浏览历史记录。它会确定用户执行以下任务的模式:

  • 从其他网站登录 website2.com(进入)

  • 浏览任意数量的其他页面(站内)

  • 做出购买(购买)

在以下语句中,模式 P (Entry Onsite* Purchase) 包括三种事件类型:进入、站内和购买。当 Vertica 在输入表中找到匹配时,相关模式实例必须是一个进入事件类型,且后跟零个或多个站内事件类型以及一个购买事件类型

=> SELECT uid,
       sid,
       ts,
       refurl,
       pageurl,
       action,
       event_name(),
       pattern_id(),
       match_id()
FROM clickstream_log
MATCH
  (PARTITION BY uid, sid ORDER BY ts
   DEFINE
     Entry    AS RefURL  NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%',
     Onsite   AS PageURL ILIKE     '%website2.com%' AND Action='V',
     Purchase AS PageURL ILIKE     '%website2.com%' AND Action = 'P'
   PATTERN
     P AS (Entry Onsite* Purchase)
   ROWS MATCH FIRST EVENT);

在以下输出中,前四行代表用户 1 的浏览活动的模式,而剩下三行显示了用户 2 的浏览习惯。

 uid | sid |    ts    |        refurl        |       pageurl        | action | event_name | pattern_id | match_id
-----+-----+----------+----------------------+----------------------+--------+------------+------------+----------
   1 | 100 | 12:00:00 | website1.com         | website2.com/home    | V      | Entry      |          1 |        1
   1 | 100 | 12:01:00 | website2.com/home    | website2.com/floby   | V      | Onsite     |          1 |        2
   1 | 100 | 12:02:00 | website2.com/floby   | website2.com/shamwow | V      | Onsite     |          1 |        3
   1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy     | P      | Purchase   |          1 |        4
   2 | 100 | 12:10:00 | website1.com         | website2.com/home    | V      | Entry      |          1 |        1
   2 | 100 | 12:11:00 | website2.com/home    | website2.com/forks   | V      | Onsite     |          1 |        2
   2 | 100 | 12:13:00 | website2.com/forks   | website2.com/buy     | P      | Purchase   |          1 |        3
(7 rows)

另请参阅