Pipelinedb介绍
场景
假设有这样的一些需求
- 计数器(比如PV)
- 一些统计业务
课题
- 实时性
- 与rawdata解耦
- 数据量
- 负荷
- 程序开发成本
PipelineDB特性
- 不存原始数据,只存结果 (节省空间)
- 内存计算 (快速)
- SQL查询统计结果
- 实现实时统计(流式计算)
概念

安装与配置
- http://docs.pipelinedb.com/installation.html
- http://docs.pipelinedb.com/conf.html
测试
假设统计每人每天的工作量
建表
CREATE STREAM daily_stream (
work_date date,
worker varchar(20)
);
CREATE CONTINUOUS VIEW daily_view AS
SELECT
work_date,
worker,
COUNT(*) AS cnt
FROM daily_stream
GROUP BY work_date,worker;
Streaming
INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'jhon');
INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'tom');
INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'jhon');
INSERT INTO daily_stream (work_date, worker) VALUES (CURRENT_DATE, 'brown');
:
:
Query
select * from daily_view;
work_date | worker | cnt
------------+--------+-----
2017-05-05 | jhon | 2
2017-05-05 | tom | 1
2017-05-05 | brown | 1
(3 rows)
查看占用空间
你可以通过\l+命令查看它的占用空间,因为它只存结果数据,所以你会发现磁盘占用很少。
Replication
- create a role on the primary with “REPLICATION” previledges.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
test_db=# CREATE ROLE replicator WITH LOGIN REPLICATION PASSWORD 'xxxxxx';
- add an entry for the standby to the “pg_hba.conf” file.
host replication replicator 192.168.1.2/32 md5
- set a few configuration parameters on the primary by either updating the “pipelinedb.conf” file.
wal_level = hot_standby
hot_standby = on
max_replication_slots = 1
max_wal_senders = 2
- create a replication slot for the standby.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
test_db=# SELECT * FROM pg_create_physical_replication_slot('replicator_slot');
This is all the setup work we need to do on the primary. Let’s move on to the standby now.
- taking a base backup of the primary on the standby.
[user@HOST1 ~]$ pipeline-basebackup -X stream -D /your/path/pipelinedb/data -h 192.168.1.2 -p 5432 -U replicator
- write a “recovery.conf” in the standby’s data directory.
standby_mode = 'on'
primary_slot_name = 'replicator_slot'
primary_conninfo = 'host=192.168.1.1 port=5432 user=replicator password=xxxxxx'
recovery_target_timeline = 'latest'
- make sure, connect to the standby and confirm it’s in recovery mode.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
test_db=# SELECT pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
- retrieve a list of WAL sender processes via the “pg_stat_replication” view.
[user@HOST1 ~]$ pipeline test_db
pipeline (9.5.3)
Type "help" for help.
test_db=# SELECT * FROM pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+------------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
22625 | 16384 | replicator | walreceiver | 192.168.1.2 | | 58904 | 2016-07-02 18:55:28.472062+09 | | streaming | 0/1C8E9D58 | 0/1C8E9D58 | 0/1C8E9D58 | 0/1C8E9A60 | 0 | async
(1 row)
遇到的坑
- 如果运行一段时间发现内存和CPU变高的话,找到pipelinedb.conf中的autovacuum配置,给off掉。