You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CREATE OR REPLACEVIEWnbxv1_keypath_infoASSELECTws.code,
ws.script,
s.addr,
d.metadataAS descriptor_metadata,
nbxv1_get_keypath(d.metadata, ds.idx) AS keypath,
ds.metadataAS descriptors_scripts_metadata,
ws.wallet_id,
ds.idx,
ds.used,
d.descriptorFROM ((wallets_scripts ws
JOIN scripts s ON (((s.code=ws.code) AND (s.script=ws.script))))
LEFT JOIN ((wallets_descriptors wd
JOIN descriptors_scripts ds ON (((ds.code=wd.code) AND (ds.descriptor=wd.descriptor))))
JOIN descriptors d ON (((d.code=ds.code) AND (d.descriptor=ds.descriptor)))) ON (((wd.wallet_id=ws.wallet_id) AND (wd.code=ws.code) AND (ds.script=ws.script))));
This view is queried using a WHERE clause that filters by code and script along with additional predicates here:
SELECTts.script,
ts.addr,
ts.derivation,
ts.keypath,
ts.redeemFROM unnest(@records) AS r (script),
LATERAL
(SELECT script,
addr,
descriptor_metadata->>'derivation' derivation,
keypath,
descriptors_scripts_metadata->>'redeem' redeem,
descriptors_scripts_metadata->>'blindedAddress' blinded_addr
FROM nbxv1_keypath_info ki
WHEREki.code=@code
ANDki.script=r.script) ts;
Issue Details
In my environment, the data distribution is as follows:
BTC: 182,250 records
LTC: 3,600 records
When filtering on the code and script values:
BTC Query: Returns results in about 300ms.
LTC Query: Returns results in around 20 seconds (despite LTC having far fewer records)
What’s Happening?
PostgreSQL’s query planner relies on up-to-date table statistics to generate efficient execution plans. These statistics are automatically refreshed by the auto-analyze process, which is triggered when a sufficient number of rows have been modified. The decision to run an auto-analyze is based on the formula:
In our scenario, after inserting 72,090 BTC records, an auto-analyze is triggered just before the final BTC record is added. This means that the statistics are refreshed for BTC, and any subsequent query using these fresh statistics will perform optimally. The next auto-analyze would only occur after modifying approximately:
182,250 * 0.1 + 50 = 18,275 row_updates
However, after the BTC batch, only 3,600 LTC records are added. Since 3,600 is well below the 18,275 threshold, PostgreSQL does not trigger a new auto-analyze. As a result, the statistics for the LTC data remain outdated. This discrepancy causes the query planner to choose a suboptimal execution plan for queries filtering on code = 'LTC', leading to significantly slower performance (20 seconds).
It’s important to note that after manually running an ANALYZE wallets_scripts;, the updated statistics allowed the planner to generate an optimal plan, and the query performance improved dramatically.
Reproduction Steps
Run the following Python script to simulate the scenario:
Query the view using a filter on the code and script field (code should be LTC and the script does not have to exist, you can use a random value).
SELECTts.script,
ts.addr,
ts.derivation,
ts.keypath,
ts.redeemFROM unnest(ARRAY['76a914fbf7c85733d88415245534a9c0239b63d116660488ac']) AS r (script),
LATERAL
(SELECT script,
addr,
descriptor_metadata->>'derivation' derivation,
keypath,
descriptors_scripts_metadata->>'redeem' redeem,
descriptors_scripts_metadata->>'blindedAddress' blinded_addr
FROM nbxv1_keypath_info ki
WHEREki.code='LTC'ANDki.script=r.script) ts;
Observe the performance differences based on the code filtering.
Expected Behavior
Filtering on code should yield consistent performance regardless of the underlying data distribution. Ideally, the query planner should use up-to-date statistics to generate an optimal execution plan, even when the auto-analyze threshold is not met by the LTC records.
Actual Behavior
BTC Queries: Benefit from a recent auto-analyze (triggered multiple times while adding BTC records and a last time jst before adding the last record), returning results quickly (around 300ms).
LTC Queries: Suffer from outdated statistics due to the absence of an auto-analyze after the BTC batch. This results in a suboptimal execution plan and slower performance (around 3 seconds to 10 seconds).
Environment
NBXplorer Version: Latest
PostgreSQL Version: Latest
PostgreSQL Settings: Default
Data Distribution: 182,250 BTC records and 3,600 LTC records
Possible Solution
I was able to improve the query efficiency by changing the view to a function with parameters (code and script) and filtering on wallets_scripts first instead of calculating the view on all table elements. This increased the query performance and 20 seconds transformed to 300ms. I have created a PR which will be attached to this issue.
Thank you for your attention to this matter @NicolasDorier!
The text was updated successfully, but these errors were encountered:
NBXplorer is working amazingly well overall, but I’ve encountered a performance issue with the
nbxv1_keypath_info
view under specific conditions.The view is defined as follows here:
This view is queried using a WHERE clause that filters by
code
andscript
along with additional predicates here:Issue Details
In my environment, the data distribution is as follows:
When filtering on the
code
andscript
values:What’s Happening?
PostgreSQL’s query planner relies on up-to-date table statistics to generate efficient execution plans. These statistics are automatically refreshed by the auto-analyze process, which is triggered when a sufficient number of rows have been modified. The decision to run an auto-analyze is based on the formula:
In our scenario, after inserting 72,090 BTC records, an auto-analyze is triggered just before the final BTC record is added. This means that the statistics are refreshed for BTC, and any subsequent query using these fresh statistics will perform optimally. The next auto-analyze would only occur after modifying approximately:
However, after the BTC batch, only 3,600 LTC records are added. Since 3,600 is well below the 18,275 threshold, PostgreSQL does not trigger a new auto-analyze. As a result, the statistics for the LTC data remain outdated. This discrepancy causes the query planner to choose a suboptimal execution plan for queries filtering on
code = 'LTC'
, leading to significantly slower performance (20 seconds).It’s important to note that after manually running an
ANALYZE wallets_scripts;
, the updated statistics allowed the planner to generate an optimal plan, and the query performance improved dramatically.Reproduction Steps
code
andscript
field (code
should be LTC and the script does not have to exist, you can use a random value).code
filtering.Expected Behavior
Filtering on
code
should yield consistent performance regardless of the underlying data distribution. Ideally, the query planner should use up-to-date statistics to generate an optimal execution plan, even when the auto-analyze threshold is not met by the LTC records.Actual Behavior
Environment
Possible Solution
I was able to improve the query efficiency by changing the
view
to afunction
with parameters (code and script) and filtering on wallets_scripts first instead of calculating theview
on all table elements. This increased the query performance and 20 seconds transformed to 300ms. I have created a PR which will be attached to this issue.Thank you for your attention to this matter @NicolasDorier!
The text was updated successfully, but these errors were encountered: