上QQ阅读APP看书，第一时间看更新

Introducing parallel queries

Traditionally, a query had to run on a single CPU. While this was just fine in the OLTP world, it started to be a problem for analytical applications, which were bound to the speed provided by a single core. With PostgreSQL 9.6, parallel queries were introduced. Of course, implementing parallel queries was hard and so a lot of infrastructure has already been implemented over the years. All this infrastructure is now available to provide the end user with parallel sequential scans. The idea is to make many CPUs work on complicated WHERE conditions during a sequential scan. Version 9.6 also allowed for parallel aggregates and parallel joins. Of course, there is a lot of work left, but we are already looking at a major leap forward.

To control parallelism, there are two essential settings:

test=# SHOW max_worker_processes; 
 max_worker_processes  
---------------------- 
 8 
(1 row) 

test=# SHOW max_parallel_workers_per_gather ; 
 max_parallel_workers_per_gather  
--------------------------------- 
 2 
(1 row)

The first one limits the overall number of worker processes available. The second one controls the number of workers allowed per gather node.

A gather node is a new thing you will see in an execution plan. It is in charge of unifying results coming from parallel subprocesses.

In addition to those fundamental settings, there are a couple of new optimizer parameters to adjust the cost of parallel queries.