Skip to content

bug(db): matchit_cem_summary_statistics not working with grouping_attribute #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
CoreyCole opened this issue May 21, 2018 · 0 comments

Comments

@CoreyCole
Copy link
Owner

CoreyCole commented May 21, 2018

To reproduce the issue, run scripts:

SELECT bin_equal_width(
  'flights_weather_demo_airports',
  ARRAY['vism', 'wspdm'],
  ARRAY[10, 9],
  'flights_weather_demo_binned'
);

SELECT matchit_cem(
  'flights_weather_demo_binned',
  'fid',
  'lowpressure',
  ARRAY['vism_ew_binned_10', 'wspdm_ew_binned_9', 'thunder', 'fog', 'rain', 'snow'],
  'flights_weather_demo_matched'
);

SELECT matchit_cem_summary_statistics(
  'flights_weather_demo_binned',
  'flights_weather_demo_matched',
  'lowpressure',
  'depdel15',
  'airport',
  ARRAY['vism', 'wspdm', 'thunder', 'fog', 'rain', 'snow'],
  ARRAY['vism', 'wspdm'],
  ARRAY['vism_ew_binned_10', 'wspdm_ew_binned_9']
);

matchit_cem_summary_statistics calls the functions for the treatment and each covariate that eventually run the SQL like this (i.e. for treatment = lowpressure)

CREATE TABLE flights_weather_demo_matched_ate_cem AS
WITH Blocks AS 
    (SELECT avg(depdel15) AS avg_outcome,
         subclass_id,
         lowpressure,
         airport
    FROM flights_weather_demo_matched
    GROUP BY  subclass_id, lowpressure, airport), Weights AS 
    (SELECT cast(count(*) AS NUMERIC) / 8450 AS block_weight,
         subclass_id
    FROM flights_weather_demo_matched
    GROUP BY  subclass_id, airport)
SELECT Blocks.lowpressure AS treatment,
         sum(avg_outcome * block_weight) AS weighted_avg_outcome,
         airport
FROM Blocks, Weights
WHERE Blocks.subclass_id = Weights.subclass_id
GROUP BY  Blocks.lowpressure,
         airport;
         
SELECT json_agg(
  json_build_object(
    'treatment', treatment, 
    'weighted_avg_outcome', weighted_avg_outcome, 
    'grouping_attribute', airport
  )
)
FROM flights_weather_demo_matched_ate_cem
GROUP BY  airport;

The second query returns 4 results (1 for each airport 'COS', 'MFE', 'MTJ', 'ROC')

[
  [
    {
      "treatment": 0,
      "weighted_avg_outcome": 0.1280433373918213,
      "grouping_attribute": "MFE"
    },
    {
      "treatment": 1,
      "weighted_avg_outcome": 0.19922603196987285,
      "grouping_attribute": "MFE"
    }
  ],
  [
    {
      "treatment": 0,
      "weighted_avg_outcome": 0.16417574517502626,
      "grouping_attribute": "MTJ"
    },
    {
      "treatment": 1,
      "weighted_avg_outcome": 0.1849468684615984,
      "grouping_attribute": "MTJ"
    }
  ],
  [
    {
      "treatment": 0,
      "weighted_avg_outcome": 0.1938166463459516,
      "grouping_attribute": "COS"
    },
    {
      "treatment": 1,
      "weighted_avg_outcome": 0.2036991197643159,
      "grouping_attribute": "COS"
    }
  ],
  [
    {
      "treatment": 0,
      "weighted_avg_outcome": 0.20972574225288967,
      "grouping_attribute": "ROC"
    },
    {
      "treatment": 1,
      "weighted_avg_outcome": 0.2952289561187512,
      "grouping_attribute": "ROC"
    }
  ]
]

But, since it is an aggregate group by query, when saving the results into a variable in postgresql:

CREATE OR REPLACE FUNCTION get_json_ate(
  table_name TEXT,
  treatment TEXT,
  grouping_attribute TEXT
) RETURNS JSONB AS $func$
DECLARE
  command_string TEXT;
  result_arr JSONB;
BEGIN
  command_string := 'SELECT json_agg(json_build_object('
    || '''treatment'', treatment, '
    || '''weighted_avg_outcome'', weighted_avg_outcome';

  IF grouping_attribute IS NOT NULL THEN
    command_string := command_string || ', ''grouping_attribute'', ' || quote_ident(grouping_attribute);
  END IF;

  command_string := command_string || ')) FROM ' || quote_ident(table_name);

  IF grouping_attribute IS NOT NULL THEN
    command_string := command_string || ' GROUP BY ' || quote_ident(grouping_attribute);
  END IF;

  RAISE NOTICE '%', command_string;
  EXECUTE command_string INTO result_arr;

  RETURN result_arr;
END;
$func$ LANGUAGE plpgsql;

only the first result is saved in the result_arr return value

[
    {
      "treatment": 0,
      "weighted_avg_outcome": 0.1280433373918213,
      "grouping_attribute": "MFE"
    },
    {
      "treatment": 1,
      "weighted_avg_outcome": 0.19922603196987285,
      "grouping_attribute": "MFE"
    }
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant