先检查主机资源使用情况,再分析慢查询日志和执行计划,接着排查锁竞争与长事务,最后评估表膨胀与维护任务。

PostgreSQL性能下降的排查需要系统性地从多个维度入手,不能仅依赖单一指标。核心思路是定位瓶颈、缩小范围、验证假设。以下是实用的排查方法论,按执行顺序组织,便于快速响应。
观察整体负载与资源使用情况
先看数据库所在主机的资源是否成为瓶颈:
- CPU使用率:持续接近100%可能意味着复杂查询或高并发导致计算压力大
- 内存使用:检查是否有频繁换页(swap),shared_buffers和work_mem配置是否合理
- 磁盘I/O:I/O等待时间高通常说明查询涉及大量顺序扫描或WAL写入压力大
- 网络延迟:客户端与数据库间带宽不足或延迟高也会影响感知性能
工具推荐:top、htop、iostat、vmstat,结合监控系统如Prometheus+Grafana更直观。
确认慢查询是否存在及分布
启用并分析慢查询日志是关键一步:
- 在postgresql.conf中设置:
- log_min_duration_statement = 1000 (记录超过1秒的SQL)
- log_statement = 'none' (避免日志过大)
- 使用pg_stat_statements扩展查看最耗时的SQL:
SELECT query, calls, total_time, rows, 100.0*shared_blks_hit/nullif(shared_blks_hit+shared_blks_read,0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
重点关注调用频繁且平均执行时间长的语句,优先优化这类“高频重载”SQL。
检查执行计划是否合理
对识别出的慢查询运行EXPLAIN (ANALYZE, BUFFERS),关注以下几点:
- 是否出现全表扫描(Seq Scan)而本应走索引?可能是统计信息过期或选择性差
- 嵌套循环(Nested Loop)导致行数放大,考虑改写或调整join_collapse_limit
- Hash表溢出到磁盘(Workfile),说明work_mem不足
- Buffers部分显示物理读多,说明数据未缓存,需评估shared_buffers和操作系统缓存
记得运行ANALYZE table_name更新统计信息,有时就能让执行计划回归正常。
排查锁竞争与长事务
阻塞型锁会直接导致请求堆积:
- 查看当前活跃锁:
SELECT blocked_locks.pid AS blocked_pid, blocking_locks.pid AS blocking_pid, blocked_activity.query AS blocked_query, blocking_activity.query AS blocking_query FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted; - 检查是否有长时间运行的事务:
SELECT pid, now() - xact_start AS duration, query FROM pg_stat_activity WHERE state IN ('idle in transaction', 'active') AND now() - xact_start > interval '5 minutes';
长期未提交的事务不仅占用锁,还会阻碍VACUUM清理dead tuple,进一步影响性能。
评估表膨胀与维护任务
频繁UPDATE/DELETE的表容易产生膨胀:
- 使用以下查询检查膨胀率:
SELECT schemaname, tablename, n_dead_tup, n_live_tup, round(100.0 * n_dead_tup / (n_live_tup + n_dead_tup), 2) AS dead_ratio FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY dead_ratio DESC; - 确认autovacuum是否及时工作,查看日志中是否有AUTO VACUUM启动记录
- 必要时手动执行VACUUM FULL(注意锁表)或重建索引
基本上就这些。整个过程要由外到内、从宏观到微观,先看资源再查SQL,接着分析执行路径和并发问题,最后关注数据维护状态。不复杂但容易忽略细节。











