Background
Over the weekend, I decided to try running a Postgres database in my Homelab. In my current setup, the most convenient option for storage is NFS. However, NFS is especially tricky for databases. A misconfigured setup can lead to performance or data corruption issues.
After watching this amazing talk, I realized that as long as we can guarantee that WAL (Write-Ahead Logs) buffers are written to the network storage, the database should be able to recover from the errors in case of a failure. Armed with this knowledge, I made an attempt on storing Postgres data on an NFS share. In the rest of this post, I'll share more details about the requirements of safely running Postgres on NFS and the specific NFS options I used.
Safety Requirements
Starting with the official docs, there are two important considerations to take into account for NFS:
The only firm requirement for using NFS with PostgreSQL is that the file system is mounted using the hard option.
And
It is not necessary to use the sync mount option. The behavior of the async option is sufficient […] However, it is strongly recommended to use the sync export option on the NFS server on systems where it exists (mainly Linux).
Adding the hard
option to the client mount options is straightforward, but the second recommendation requires a bit more attention and explanation.
The sync
export option on the NFS server differs from the sync
mount option on the client. When a client mounts an NFS share as async
, it buffers the writes and transmits them to the server with a delay (the timing of transmission depends on the underlying implementation). While this can enhance client write performance, it does so at the expense of data durability. If the client machine crashes before transmitting the writes to the server, some data may be lost. This scenario is not unique to NFS as local file systems behave similarly. A database such as Postgres is designed to recover in such situations by replying the Write Ahead Logs, provided they are committed to stable storage. That’s why Postgres invokes fsync
on WAL files.
Enabling async
on the server side allows it to respond to requests before committing changes to storage. However, in the event of an unexpected crash or power loss, data loss may occur. By enabling the sync
option on the NFS server, we can ensure immediate writing of transmitted data to storage. So as long as fsync
calls succeed, we can be sure that the WAL files are written to the remote storage.
Fsync Failures
Various factors, including network failures, can lead to fsync
call failures. As a safety measure, Postgres intentionally panics in response to these failures. Therefore fsync
failures are one of the many reasons to run Postgres in a highly available setup.
NFS Config
I’m using a Synology NAS. Therefore enabling the sync
option means not checking the Enable asynchronous box in the NFS rule of the share:
NFS Share Options on Synology
Which roughly translates to to the following line in the /etc/exports
file:
/path/to/share <IP_CIDR>(rw,sync,no_wdelay,no_root_squash,sec=sys)
And with the help of these two pages, I ended up with the following mount options for the client:
defaults,vers=4.1,proto=tcp,suid,rw,timeo=600,retrans=2,hard,fg,rsize=8192,wsize=8192,noatime,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0
Conclusion
In this post we reviewed the requirements for safely running Postgres on NFS. We also touched upon fsync
and Postgres safeguards against fsync
failures.
While our main focus was on NFS setup for Postgres, the information is transferable to other databases and storage systems.