PostgreSQL 17 Beta: B-Tree just got promoted to Index CEO
The release of PostgreSQL 17 beta brought a bunch of new interesting features. Improvements to the vacuum execution time, memory consumption, faster ANALYZE, etc.., but the one that most databases and developers will appreciate that also caught my eye right off the bat, are the improvements to the B-Tree Index when using the IN or ANY clauses. I’ve read improvements ranging from 10% to 30% without any change to your database or table structure so I wanted to test it out for one of my production use cases.
The B-Tree is the default index when you don’t specify the type you want to use, likely making it the most used Index by the community. Going in-depth on how it works is out of the scope of this performance test but the following articles make a wonderful job describing it:
In short, the patch written by Peter Geoghegan and Matthias van de Meent avoids duplicate leaf page access that will ultimately result in better query plans and faster query overall.
I’ve created 2 docker containers with PostgreSQL 16 and 17 to test the performance differences between them.
Table
Loaded the following table with 20M records.
postgres=# \d users
Table "public.users"
Column | Type | Collation | Nullable | Default
--------------+--------------------------------+-----------+----------+---------
firstname | character varying(300) | | |
lastname | character varying(300) | | |
name | character varying(300) | | |
createdat | timestamp(6) without time zone | | |
emailaddress | character varying(300) | | |
lockout | boolean | | |
status | numeric(30,0) | | |
lockout_date | timestamp(6) without time zone | | |
Indexes:
"users_at_idx_1" btree (lockout, status, createdat DESC NULLS LAST)
Query
select
*
from
users
where
lockout in (true)
and status in (3, 4)
and createdat in ('2023-05-18 00:00:00', '2023-06-19 00:00:00');
Query plan — v16
Index Scan using users_at_idx_1 on public.users (cost=0.56..24855.55 rows=7245 width=72) (actual time=0.169..49.792 rows=3686 loops=1)
Output: firstname, lastname, name, createdat, emailaddress, lockout, status, lockout_date
Index Cond: ((users.lockout = true) AND (users.status = ANY ('{3,4}'::numeric[])) AND (users.createdat = ANY ('{"2023-05-18 00:00:00","2023-06-19 00:00:00"}'::timestamp without time zone[])))
Buffers: shared hit=56 read=3651
Planning:
Buffers: shared hit=56
Planning Time: 0.562 ms
Execution Time: 50.249 ms
Query plan — v17
Index Scan using users_at_idx_1 on public.users (cost=0.56..24879.59 rows=7255 width=72) (actual time=0.132..33.704 rows=3686 loops=1)
Output: firstname, lastname, name, createdat, emailaddress, lockout, status, lockout_date
Index Cond: ((users.lockout = true) AND (users.status = ANY ('{3,4}'::numeric[])) AND (users.createdat = ANY ('{"2023-05-18 00:00:00","2023-06-19 00:00:00"}'::timestamp without time zone[])))
Buffers: shared hit=48 read=3653
Planning:
Buffers: shared hit=53
Planning Time: 0.584 ms
Execution Time: 34.102 ms
To note the drop down of buffers shared hit, which translates into smaller number of pages being read. (Smaller is better, for a change)
PgBench — v16
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 4
maximum number of tries: 1
duration: 300 s
number of transactions actually processed: 83648
number of failed transactions: 0 (0.000%)
latency average = 72.225 ms
initial connection time = 38.957 ms
tps = 276.911368 (without initial connection time)
PgBench — v17
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 4
maximum number of tries: 1
duration: 300 s
number of transactions actually processed: 180075
number of failed transactions: 0 (0.000%)
latency average = 33.318 ms
initial connection time = 39.670 ms
tps = 600.271976 (without initial connection time)
Note
Depending on how your index is structured you might also see a reduced number on the idx_scan of your tables on pg_stat_all_indexes since in cases of duplicate leaf pages, PostgreSQL 17 wouldn’t need to traverse the Index Tree again and therefore producing a smaller number of scans.
Conclusion
The query plan execution shows a ~32% improvement on PostgreSQL and PgBench more than 100% improvement in the number of transactions possible over the same query in a period of 300 seconds. It’s possible that you can see a bigger overall improvement of your database performance depending on how your workload and how your database is structured but these values are definitely promising.