
Managing software upgrades
Software in the server space is normally fairly stable. However, elements such as security updates and bug fixes must be applied. Highly available servers can't be stopped often, but without important upgrades, they could crash or experience a breach, which would be far more serious.
Then how do we ensure that updates can be applied safely while maintaining consistent availability? Once again, this often comes down to preparation. We prepare by having duplicate online data copies and by abstracting access paths. With architecture like this in place, we can switch to a backup server while upgrading the primary; thus, the database never actually goes offline.
We'll explore this scenario here, especially as it will be a very common one.
Getting ready
For this section, we need at least one extra server with PostgreSQL installed. This server should be running a copy of our database. We can follow the Managing system migrations recipe to build a copy if we don't already have one available. We will also use ideas introduced in the Exploring the magic of virtual IPs recipe. Reviewing these recipes now might be a good idea.
How to do it...
For this scenario, assume that we have two servers with the addresses 192.168.1.10
and 192.168.1.20
, where 192.168.1.10
is currently the primary server. In addition, we have a virtual IP address of 192.168.1.30
on the eth0:pgvip
Ethernet device. To upgrade the PostgreSQL software on both nodes, follow these steps:
- Stop the database copy on
192.168.1.20
as thepostgres
user using this command:pg_ctl -D /path/to/database stop -m fast
- Perform any necessary software upgrades. For example, to upgrade a Debian or Ubuntu server to the latest PostgreSQL 9.3, use the following command as a root-capable user on
192.168.1.20
:sudo apt-get install postgresql-9.3
- Start the database copy on
192.168.1.20
as thepostgres
user:pg_ctl -D /path/to/database start
- As a root-capable user on
192.168.1.10
, stop the virtual IP address with the following command:sudo ifconfig eth0:pgvip down
- As a database superuser, issue a checkpoint to the database on
192.168.1.10
:CHECKPOINT;
- Connect to PostgreSQL on
192.168.1.10
and issue the following query to check replication status:SELECT sent_location, replay_locationFROM pg_stat_replicationWHERE usename = 'rep_user';
- Periodically, repeat the preceding query until
sent_location
andreplay_location
match. - As
postgres
, stop the PostgreSQL service on192.168.1.10
with this command:pg_ctl -D /path/to/database stop -m fast
- As
postgres
, promote the PostgreSQL replica on192.168.1.20
with this command:pg_ctl -D /path/to/database promote
- As a root-capable user on
192.168.1.20
, start the virtual IP address with the following command:sudo ifconfig eth0:pgvip 192.168.1.30 up
- If necessary, inform the developers and support staff to restart the application's database connection pools.
- Repeat any necessary software upgrades on
192.168.1.10
as already performed on192.168.1.20
. - Erase the existing database on
192.168.1.10
as thepostgres
user this way:rm -Rf /path/to/database
- Use
pg_basebackup
on192.168.1.10
to make a copy of the upgraded database on192.168.1.20
:pg_basebackup -U rep_user -h 192.168.1.20 -D /path/to/database
- Create a file named
recovery.con
f in/path/to/databas
e with the following contents:standby_mode = 'on' primary_conninfo = 'host=192.168.1.20 port=5432 user=rep_user'
- Start the newly created copy as the
postgres
user on192.168.1.10
using the following command:pg_ctl -D /path/to/database start
How it works...
This entire process is very long, but we hope to illustrate that it is actually very straightforward. The first step is to upgrade the mirror copy of the database under the assumption that it is not actively utilized by applications or users. The role of the secondary node in this case is to act as an emergency backup for the primary database node. As it's not being used, we are able to stop the database, perform any updates necessary, and start it and allow it to synchronize again.
Afterwards, we isolate the primary database node by disabling the virtual IP address. This allows the streaming replica to replay the last few active transactions so that it's fully synchronized before we make it the new primary database. We accomplish this by issuing CHECKPOINT
and watching the replication status until it matches on both systems. When the replication status matches, we can stop the primary PostgreSQL server; its role in the process is complete.
As software upgrades may take some time to complete or require a server restart, we need to immediately make the secondary node available as the primary database. We start by promoting the replica to become the new primary by sending the promote
command to pg_ctl
. Once the database is writable, we reinstate the 192.168.1.30
virtual IP address so that applications and users can reconnect safely.
This process of node switching is fairly quick, provided we already have a replica ready to take over. With the replica acting as a primary, the next step is to perform any upgrades necessary, just as we did on the secondary node. After the upgrades are finished, we cannot simply restart the primary database again, as the replica has been acting as a primary database for a period of time.
This means that we need to rebuild the primary database as a new replica. This makes both nodes ready for the next upgrade and maintains the two-node relationship. We start this process by erasing the old contents of the database and then use pg_basebackup
to copy the current primary database. Then, we create a new recovery.conf
file and direct it to act as a new replica. Once the replica is started, we have the same configuration as we had earlier, but now, the roles are reversed; 192.168.1.20
is the primary, and 192.168.1.10
is the replica.
There's more...
Astute readers may have noticed that using pg_basebackup
to copy the entire database following a minor upgrade is somewhat wasteful. We agree! In the later recipes, we will make use of rsync
or PostgreSQL-specific software to perform these tasks instead. This recipe was already pretty long, and setting up rsync
properly for this operation would have added quite a bit more time. The point is to show you the switching process; feel free to substitute better methods you know for synchronizing data.
See also
- In addition to
rsync
, a newer utility namedpg_rewind
can make resetting replicas much easier. It is beyond the scope of this chapter, so we recommend that you read more about it at https://github.com/vmware/pg_rewind.