场景

假设有这样的一些需求

  • 计数器(比如PV)
  • 一些统计业务

课题

  • 实时性
  • 与rawdata解耦
  • 数据量
  • 负荷
  • 程序开发成本

PipelineDB特性

  • 不存原始数据,只存结果 (节省空间)
  • 内存计算 (快速)
  • SQL查询统计结果
  • 实现实时统计(流式计算)

概念

pipelinedb

安装与配置

  • http://docs.pipelinedb.com/installation.html
  • http://docs.pipelinedb.com/conf.html

测试

假设统计每人每天的工作量

建表

CREATE STREAM daily_stream (
    work_date date,
    worker varchar(20)
);
CREATE CONTINUOUS VIEW daily_view AS
SELECT
    work_date,
    worker,
    COUNT(*) AS cnt
FROM daily_stream
GROUP BY work_date,worker;

Streaming

INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'jhon');
INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'tom');
INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'jhon');
INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'brown');


Query

select * from daily_view;
 work_date  | worker | cnt 
------------+--------+-----
 2017-05-05 | jhon   |   2
 2017-05-05 | tom    |   1
 2017-05-05 | brown  |   1
(3 rows)

查看占用空间

你可以通过\l+命令查看它的占用空间,因为它只存结果数据,所以你会发现磁盘占用很少。

Replication

  • create a role on the primary with “REPLICATION” previledges.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
 
test_db=# CREATE ROLE replicator WITH LOGIN REPLICATION PASSWORD 'xxxxxx';
  • add an entry for the standby to the “pg_hba.conf” file.
host    replication     replicator      192.168.1.2/32         md5
  • set a few configuration parameters on the primary by either updating the “pipelinedb.conf” file.
wal_level = hot_standby
hot_standby = on
max_replication_slots = 1
max_wal_senders = 2 
  • create a replication slot for the standby.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
 
test_db=# SELECT * FROM pg_create_physical_replication_slot('replicator_slot');

This is all the setup work we need to do on the primary. Let’s move on to the standby now.

  • taking a base backup of the primary on the standby.
[user@HOST1 ~]$ pipeline-basebackup -X stream -D /your/path/pipelinedb/data -h 192.168.1.2 -p 5432 -U replicator
  • write a “recovery.conf” in the standby’s data directory.
standby_mode = 'on'
primary_slot_name = 'replicator_slot'
primary_conninfo = 'host=192.168.1.1 port=5432 user=replicator password=xxxxxx'
recovery_target_timeline = 'latest'
  • make sure, connect to the standby and confirm it’s in recovery mode.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
 
test_db=# SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)
  • retrieve a list of WAL sender processes via the “pg_stat_replication” view.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
 
test_db=# SELECT * FROM pg_stat_replication;
  pid  | usesysid |  usename   | application_name | client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+------------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 22625 |    16384 | replicator | walreceiver      | 192.168.1.2  |                 |       58904 | 2016-07-02 18:55:28.472062+09 |              | streaming | 0/1C8E9D58    | 0/1C8E9D58     | 0/1C8E9D58     | 0/1C8E9A60      |             0 | async
(1 row)

遇到的坑

  • 如果运行一段时间发现内存和CPU变高的话,找到pipelinedb.conf中的autovacuum配置,给off掉。