Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

juanmigdr · 2025-03-07T09:49:18Z

NBXplorer is working amazingly well overall, but I’ve encountered a performance issue with the nbxv1_keypath_info view under specific conditions.

The view is defined as follows here:

CREATE OR REPLACE VIEW nbxv1_keypath_info AS
 SELECT ws.code,
    ws.script,
    s.addr,
    d.metadata AS descriptor_metadata,
    nbxv1_get_keypath(d.metadata, ds.idx) AS keypath,
    ds.metadata AS descriptors_scripts_metadata,
    ws.wallet_id,
    ds.idx,
    ds.used,
    d.descriptor
   FROM ((wallets_scripts ws
     JOIN scripts s ON (((s.code = ws.code) AND (s.script = ws.script))))
     LEFT JOIN ((wallets_descriptors wd
     JOIN descriptors_scripts ds ON (((ds.code = wd.code) AND (ds.descriptor = wd.descriptor))))
     JOIN descriptors d ON (((d.code = ds.code) AND (d.descriptor = ds.descriptor)))) ON (((wd.wallet_id = ws.wallet_id) AND (wd.code = ws.code) AND (ds.script = ws.script))));

This view is queried using a WHERE clause that filters by code and script along with additional predicates here:

SELECT ts.script,
       ts.addr,
       ts.derivation,
       ts.keypath,
       ts.redeem
FROM unnest(@records) AS r (script),
     LATERAL
         (SELECT script,
                 addr,
                 descriptor_metadata->>'derivation' derivation,
                 keypath,
                 descriptors_scripts_metadata->>'redeem' redeem,
                 descriptors_scripts_metadata->>'blindedAddress' blinded_addr
          FROM nbxv1_keypath_info ki
          WHERE ki.code=@code
            AND ki.script=r.script) ts;

Issue Details

In my environment, the data distribution is as follows:

BTC: 182,250 records
LTC: 3,600 records

When filtering on the code and script values:

BTC Query: Returns results in about 300ms.
LTC Query: Returns results in around 20 seconds (despite LTC having far fewer records)

What’s Happening?

PostgreSQL’s query planner relies on up-to-date table statistics to generate efficient execution plans. These statistics are automatically refreshed by the auto-analyze process, which is triggered when a sufficient number of rows have been modified. The decision to run an auto-analyze is based on the formula:

total_rows * auto_analyze_scale_factor + auto_analyze_threshold

In our scenario, after inserting 72,090 BTC records, an auto-analyze is triggered just before the final BTC record is added. This means that the statistics are refreshed for BTC, and any subsequent query using these fresh statistics will perform optimally. The next auto-analyze would only occur after modifying approximately:

182,250 * 0.1 + 50 = 18,275 row_updates

However, after the BTC batch, only 3,600 LTC records are added. Since 3,600 is well below the 18,275 threshold, PostgreSQL does not trigger a new auto-analyze. As a result, the statistics for the LTC data remain outdated. This discrepancy causes the query planner to choose a suboptimal execution plan for queries filtering on code = 'LTC', leading to significantly slower performance (20 seconds).

It’s important to note that after manually running an ANALYZE wallets_scripts;, the updated statistics allowed the planner to generate an optimal plan, and the query performance improved dramatically.

Reproduction Steps

Run the following Python script to simulate the scenario:

import requests
import time

HEADERS = {"Content-Type": "application/json"}

def create_wallet(CRYPTO):
    data = {
        "accountNumber": 0,
        "scriptPubKeyType": "Legacy",
        "__scriptPubKeyType": "Taproot",
        "_scriptPubKeyType": "SegwitP2SH"
    }
    response = requests.post(f"http://localhost:32838/v1/cryptos/{CRYPTO}/derivations", json=data, headers=HEADERS)
    
    if response.status_code == 200:
        derivation_scheme = response.json().get("derivationScheme")
        if derivation_scheme:
            return derivation_scheme
        else:
            print("Error: No derivationScheme in response")
    else:
        print(f"Error creating wallet: {response.status_code} - {response.text}")
    
    return None

def main():
    for i in range(2025):
        print(f"Iteration {i+1}")
        create_wallet("btc")
        time.sleep(0.1)
        
    for i in range(40):
        print(f"Iteration {i+1}")
        create_wallet("ltc")
        time.sleep(0.1)

if __name__ == "__main__":
    main()

Query the view using a filter on the code and script field (code should be LTC and the script does not have to exist, you can use a random value).

SELECT ts.script,
       ts.addr,
       ts.derivation,
       ts.keypath,
       ts.redeem
FROM unnest(ARRAY['76a914fbf7c85733d88415245534a9c0239b63d116660488ac']) AS r (script),
     LATERAL
         (SELECT script,
                 addr,
                 descriptor_metadata->>'derivation' derivation,
                 keypath,
                 descriptors_scripts_metadata->>'redeem' redeem,
                 descriptors_scripts_metadata->>'blindedAddress' blinded_addr
          FROM nbxv1_keypath_info ki
          WHERE ki.code='LTC'
            AND ki.script=r.script) ts;

Observe the performance differences based on the code filtering.

Expected Behavior

Filtering on code should yield consistent performance regardless of the underlying data distribution. Ideally, the query planner should use up-to-date statistics to generate an optimal execution plan, even when the auto-analyze threshold is not met by the LTC records.

Actual Behavior

BTC Queries: Benefit from a recent auto-analyze (triggered multiple times while adding BTC records and a last time jst before adding the last record), returning results quickly (around 300ms).
LTC Queries: Suffer from outdated statistics due to the absence of an auto-analyze after the BTC batch. This results in a suboptimal execution plan and slower performance (around 3 seconds to 10 seconds).

Environment

NBXplorer Version: Latest
PostgreSQL Version: Latest
PostgreSQL Settings: Default
Data Distribution: 182,250 BTC records and 3,600 LTC records

Possible Solution

I was able to improve the query efficiency by changing the view to a function with parameters (code and script) and filtering on wallets_scripts first instead of calculating the view on all table elements. This increased the query performance and 20 seconds transformed to 300ms. I have created a PR which will be attached to this issue.

Thank you for your attention to this matter @NicolasDorier!

The text was updated successfully, but these errors were encountered:

farukterzioglu · 2025-03-11T09:55:04Z

I am looking forward for this fix, awesome effort 🚀

juanmigdr linked a pull request Mar 11, 2025 that will close this issue

Improve Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #515

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

juanmigdr commented Mar 7, 2025 •

edited

Loading

farukterzioglu commented Mar 11, 2025

Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

Comments

juanmigdr commented Mar 7, 2025 • edited Loading

Issue Details

What’s Happening?

Reproduction Steps

Expected Behavior

Actual Behavior

Environment

Possible Solution

farukterzioglu commented Mar 11, 2025

juanmigdr commented Mar 7, 2025 •

edited

Loading