Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

Open
juanmigdr opened this issue Mar 7, 2025 · 1 comment · May be fixed by #515

Comments

@juanmigdr
Copy link

juanmigdr commented Mar 7, 2025

NBXplorer is working amazingly well overall, but I’ve encountered a performance issue with the nbxv1_keypath_info view under specific conditions.

The view is defined as follows here:

CREATE OR REPLACE VIEW nbxv1_keypath_info AS
 SELECT ws.code,
    ws.script,
    s.addr,
    d.metadata AS descriptor_metadata,
    nbxv1_get_keypath(d.metadata, ds.idx) AS keypath,
    ds.metadata AS descriptors_scripts_metadata,
    ws.wallet_id,
    ds.idx,
    ds.used,
    d.descriptor
   FROM ((wallets_scripts ws
     JOIN scripts s ON (((s.code = ws.code) AND (s.script = ws.script))))
     LEFT JOIN ((wallets_descriptors wd
     JOIN descriptors_scripts ds ON (((ds.code = wd.code) AND (ds.descriptor = wd.descriptor))))
     JOIN descriptors d ON (((d.code = ds.code) AND (d.descriptor = ds.descriptor)))) ON (((wd.wallet_id = ws.wallet_id) AND (wd.code = ws.code) AND (ds.script = ws.script))));

This view is queried using a WHERE clause that filters by code and script along with additional predicates here:

SELECT ts.script,
       ts.addr,
       ts.derivation,
       ts.keypath,
       ts.redeem
FROM unnest(@records) AS r (script),
     LATERAL
         (SELECT script,
                 addr,
                 descriptor_metadata->>'derivation' derivation,
                 keypath,
                 descriptors_scripts_metadata->>'redeem' redeem,
                 descriptors_scripts_metadata->>'blindedAddress' blinded_addr
          FROM nbxv1_keypath_info ki
          WHERE ki.code=@code
            AND ki.script=r.script) ts;

Issue Details

In my environment, the data distribution is as follows:

  • BTC: 182,250 records
  • LTC: 3,600 records

When filtering on the code and script values:

  • BTC Query: Returns results in about 300ms.
  • LTC Query: Returns results in around 20 seconds (despite LTC having far fewer records)

What’s Happening?

PostgreSQL’s query planner relies on up-to-date table statistics to generate efficient execution plans. These statistics are automatically refreshed by the auto-analyze process, which is triggered when a sufficient number of rows have been modified. The decision to run an auto-analyze is based on the formula:

total_rows * auto_analyze_scale_factor + auto_analyze_threshold

In our scenario, after inserting 72,090 BTC records, an auto-analyze is triggered just before the final BTC record is added. This means that the statistics are refreshed for BTC, and any subsequent query using these fresh statistics will perform optimally. The next auto-analyze would only occur after modifying approximately:

182,250 * 0.1 + 50 = 18,275 row_updates

However, after the BTC batch, only 3,600 LTC records are added. Since 3,600 is well below the 18,275 threshold, PostgreSQL does not trigger a new auto-analyze. As a result, the statistics for the LTC data remain outdated. This discrepancy causes the query planner to choose a suboptimal execution plan for queries filtering on code = 'LTC', leading to significantly slower performance (20 seconds).

It’s important to note that after manually running an ANALYZE wallets_scripts;, the updated statistics allowed the planner to generate an optimal plan, and the query performance improved dramatically.

Reproduction Steps

  1. Run the following Python script to simulate the scenario:
import requests
import time

HEADERS = {"Content-Type": "application/json"}

def create_wallet(CRYPTO):
    data = {
        "accountNumber": 0,
        "scriptPubKeyType": "Legacy",
        "__scriptPubKeyType": "Taproot",
        "_scriptPubKeyType": "SegwitP2SH"
    }
    response = requests.post(f"http://localhost:32838/v1/cryptos/{CRYPTO}/derivations", json=data, headers=HEADERS)
    
    if response.status_code == 200:
        derivation_scheme = response.json().get("derivationScheme")
        if derivation_scheme:
            return derivation_scheme
        else:
            print("Error: No derivationScheme in response")
    else:
        print(f"Error creating wallet: {response.status_code} - {response.text}")
    
    return None

def main():
    for i in range(2025):
        print(f"Iteration {i+1}")
        create_wallet("btc")
        time.sleep(0.1)
        
    for i in range(40):
        print(f"Iteration {i+1}")
        create_wallet("ltc")
        time.sleep(0.1)

if __name__ == "__main__":
    main()
  1. Query the view using a filter on the code and script field (code should be LTC and the script does not have to exist, you can use a random value).
SELECT ts.script,
       ts.addr,
       ts.derivation,
       ts.keypath,
       ts.redeem
FROM unnest(ARRAY['76a914fbf7c85733d88415245534a9c0239b63d116660488ac']) AS r (script),
     LATERAL
         (SELECT script,
                 addr,
                 descriptor_metadata->>'derivation' derivation,
                 keypath,
                 descriptors_scripts_metadata->>'redeem' redeem,
                 descriptors_scripts_metadata->>'blindedAddress' blinded_addr
          FROM nbxv1_keypath_info ki
          WHERE ki.code='LTC'
            AND ki.script=r.script) ts;
  1. Observe the performance differences based on the code filtering.

Expected Behavior

Filtering on code should yield consistent performance regardless of the underlying data distribution. Ideally, the query planner should use up-to-date statistics to generate an optimal execution plan, even when the auto-analyze threshold is not met by the LTC records.

Actual Behavior

  • BTC Queries: Benefit from a recent auto-analyze (triggered multiple times while adding BTC records and a last time jst before adding the last record), returning results quickly (around 300ms).
  • LTC Queries: Suffer from outdated statistics due to the absence of an auto-analyze after the BTC batch. This results in a suboptimal execution plan and slower performance (around 3 seconds to 10 seconds).

Environment

  • NBXplorer Version: Latest
  • PostgreSQL Version: Latest
  • PostgreSQL Settings: Default
  • Data Distribution: 182,250 BTC records and 3,600 LTC records

Possible Solution

I was able to improve the query efficiency by changing the view to a function with parameters (code and script) and filtering on wallets_scripts first instead of calculating the view on all table elements. This increased the query performance and 20 seconds transformed to 300ms. I have created a PR which will be attached to this issue.

Thank you for your attention to this matter @NicolasDorier!

@farukterzioglu
Copy link
Contributor

I am looking forward for this fix, awesome effort 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants