Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger LEFT_JOIN indexing error #9039

Closed
max-hoffman opened this issue Mar 26, 2025 · 1 comment · Fixed by dolthub/go-mysql-server#2913
Closed

Trigger LEFT_JOIN indexing error #9039

max-hoffman opened this issue Mar 26, 2025 · 1 comment · Fixed by dolthub/go-mysql-server#2913
Labels
analyzer bug Something isn't working correctness We don't return the same result as MySQL sql Issue with SQL

Comments

@max-hoffman
Copy link
Contributor

Trigger left join indexing not working as expected. In the plan below, the hash lookup using a.id_modelinfo = b.id_modelinfo indexes b's column at 7, but there are only 3 columns returned by the subquery.

dolthub db: https://www.dolthub.com/repositories/jon-mellon/every-cause-of-human-outcomes
commit for repro: 8omf09d7jr5nubr4v49o61rromd4b5f8

                     └─ LeftOuterHashJoin
                         ├─ Eq
                         │   ├─ a.id_modelinfo:7!null
                         │   └─ b.id_modelinfo:34!null
                         ├─ TableAlias(a)
                         │   └─ ProcessTable
                         │       └─ Table
                         │           ├─ name: modelinfo02
                         │           └─ columns: [id_modelinfo id_coderinfo doi table_name coder llm header1 header2 header3 header4 robustness comparison identification_strategy unit_fixed_effects time_controls estimator model_type n_observations n_geographic_units degrees_of_freedom standard_errors_clustered standard_errors_cluster_level standard_errors_robust coder_model_comments created_at reconciliation test_tails]
                         └─ HashLookup
                             ├─ left-key: TUPLE(a.id_modelinfo:7!null)
                             ├─ right-key: TUPLE(b.id_modelinfo:7!null)
                             └─ SubqueryAlias
                                 ├─ name: b
                                 ├─ outerVisibility: false
                                 ├─ isLateral: false
                                 ├─ cacheable: true
                                 ├─ colSet: (89-91)
                                 ├─ tableId: 8
                                 └─ Project
                                     ├─ columns: [d1.id_modelinfo:2!null, d1.replacement_model_key:3, d1.invalid_model:4]

insert:

INSERT INTO modeldisambig (id_modeldisambig, id_modelinfo, coder, replacement_model_key, invalid_model, id_coderinfo) VALUES ('5840504a-6147-4dc5-a098-a76822c1b5aa','97942431-218a-4bef-8db6-fc878bc51c2a','badfakecoder2','whole table|','0','7e64049e-4508-434e-a2a8-6e0fb70d28a2'),('cf0eabb4-ad5a-43f8-9a0e-bc3e0a294d89','c63ad45e-3a92-4b82-8348-66a2697d1161','badfakecoder2','whole table|','0','7e64049e-4508-434e-a2a8-6e0fb70d28a2');

trigger:

CREATE TRIGGER trg_modeldisambig_insert
AFTER INSERT ON modeldisambig
FOR EACH ROW
BEGIN
  REPLACE INTO modelinfo02_clean
  SELECT a.id_modelinfo, `id_coderinfo`, `doi`, `table_name`, 
  `coder`, `llm`, header1, header2, header3, header4,
  `robustness`, `comparison`, `identification_strategy`,
  `unit_fixed_effects`, `time_controls`, `estimator`, `model_type`,
  `n_observations`, `n_geographic_units`, `degrees_of_freedom`, 
  `standard_errors_clustered`, `standard_errors_cluster_level`, 
  `standard_errors_robust`, `coder_model_comments`, `created_at`,
  `reconciliation`, `test_tails`,
  CASE WHEN b.replacement_model_key IS NULL THEN
    CONCAT(COALESCE(TRIM(REGEXP_REPLACE(LOWER(`header1`), "^model|[:punct:]", "")), ""), "|",
           COALESCE(TRIM(REGEXP_REPLACE(LOWER(`header2`), "^model|[:punct:]", "")), ""), "|",
           COALESCE(TRIM(REGEXP_REPLACE(LOWER(`header3`), "^model|[:punct:]", "")), ""), "|",
           COALESCE(TRIM(REGEXP_REPLACE(LOWER(`header4`), "^model|[:punct:]", "")), ""))
  ELSE b.replacement_model_key END AS model_key,
  CONCAT(COALESCE(`robustness`, ''), "|", 
         COALESCE(`comparison`, ''), "|", 
         COALESCE(`identification_strategy`, ''), "|", 
         COALESCE(`unit_fixed_effects`, ''), "|", 
         COALESCE(`time_controls`, ''), "|", 
         COALESCE(`estimator`, ''), "|", 
         COALESCE(`model_type`, ''), "|", 
         COALESCE(`n_observations`, ''), "|", 
         COALESCE(`n_geographic_units`, ''), "|", 
         COALESCE(`degrees_of_freedom`, ''), "|", 
         COALESCE(`standard_errors_clustered`, ""), "|", 
         COALESCE(`standard_errors_robust`, ''), "|", 
         COALESCE(`test_tails`, '')) AS answer_set
  FROM modelinfo02 a
  LEFT JOIN (
    SELECT d1.id_modelinfo, d1.replacement_model_key, d1.invalid_model
    FROM modeldisambig d1
    JOIN (
      SELECT id_modelinfo, MAX(created_at) AS max_created
      FROM modeldisambig
      GROUP BY id_modelinfo
    ) d2 ON d1.id_modelinfo = d2.id_modelinfo AND d1.created_at = d2.max_created
  ) b ON a.id_modelinfo = b.id_modelinfo
  WHERE a.id_modelinfo = NEW.id_modelinfo 
    AND (b.invalid_model IS NULL OR b.invalid_model = 0);
END
@max-hoffman max-hoffman added analyzer bug Something isn't working correctness We don't return the same result as MySQL sql Issue with SQL labels Mar 26, 2025
@max-hoffman
Copy link
Contributor Author

Some progress here. This specific query is uncovering a lot of bugs.
dolthub/go-mysql-server#2913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer bug Something isn't working correctness We don't return the same result as MySQL sql Issue with SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant