We provide a slew of econometric tasks for which we compare the results obtained from a script produced by AI to the ground truth. For instance, analysis of an RCT, clustered RCT, panel data with a particular structure, etc.
The setup is as follows: The description of the econometric task along with the research design and data is given to AI being evaluated. We compare the results to our benchmark results.