[ENH] Adding base protocol and PC algorithm and example #11

adam2392 · 2022-08-09T17:05:40Z

Summary

Closes: #10

Assuming one installs graphs and pywhy_graphs as is from their most up-to-date versions, this now has a working implementation of the PC algorithm that is in line with the API proposal.

Then one can run the file examples/plot_pc_alg.py. This is a "huge" PR cuz it adds all the files needed to do basic book-keeping. But I can run thru the example next meeting.

Features for development:

black and flake8 for styling
sorting of imports (isort)
pydocstyle to check doc strings
code spell to check spelling
mypy to check typing
CI just needs someone in pywhy org to enable (is it possible to start creating teams to enable less friction here?)
unit test coverage will be generated via CI
docs will be built and deployed via CI

Features for API:

An extensible implementation of constraint based discovery, skeleton learning, and PC algorithm which does not impose the need for a specific graph class, but adds a GraphProtocol and EquivalenceClassProtocol as discussed
A basic implementation of the Context class

What's next?

Documentation:

from a developer perspective, if our goal is to bring in more researchers/users/contributors, we need to properly document i) how to use and ii) how to extend the algorithms presented here. E.g. https://scikit-learn.org/stable/developers/develop.html is a good inspiration. This probably should be a few short examples and tutorials that are self-contained

API:

to test the abstractions of the Context and BaseConstraintDiscovery classes, we'll want to generalize and see if they still hold. E.g. implement FCI which uses the Base and Score-based which uses the Context and see if we're missing anything.

Misc.

pinging: @amit-sharma @petergtz @emrekiciman @bloebp @robertness

cc: @robertness any chance you guys can just give us org privileges for at least the CI portion? I feel bad continuing to ask, but can you activate the circleCI, and GH actions and codecov?

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2022-08-11T16:23:41Z

A note about upstream dependency: Since the MixedEdgeGraph is subject to change and therefore the changes in causal graphs in pywhy_graphs. However, thankfully the connection is loose and due to mypy and the Graph protocols implemented here, it should be relatively robust to those changes unless the underlying data structures returned differ.

adam2392 · 2022-08-11T16:27:31Z

Another high-level note: Currently there does not exist a robust package in Python dedicated to CI testing. Therefore, for now I think it's best to keep it in-house. Eventually, it makes sense to out-source into another package, since dodiscover should be agnostic to what CI tests are used.

bloebp · 2022-08-11T16:27:41Z

dodiscover/ci/kernel_test.py

+from .base import BaseConditionalIndependenceTest
+
+
+class KernelCITest(BaseConditionalIndependenceTest):


Note that HSIC and KCI tests are also part of the GCM module: https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py

Ooo nice! I was hoping to find time for the approx version. In line with the functional API push, would it be desired to refactor those into class objects with a required API to make all functionality that relies on CI testing more robust?

Where would the best place for "CI" tests to go?

I am not sure if a class object is really necessary (at least, I don't see the benefit). In the GCM package, we ensure a compatible API by requiring the unconditional independence tests to be a function of the from Callable[[np.ndarray, np.ndarray], float] and for CI Callable[[np.ndarray, np.ndarray, np.ndarray], float]. An example would be something like:

def hsic(X: np.ndarray, Y: np.ndarray, other_parameters...) -> float: .. compute p value... return p_value def kci(X: np.ndarray, Y: np.ndarray, Z: np.ndarray, other_parameters...) -> float: .. compute p value... return p_value def my_pc_algo(data: pd.DataFrame, independence_test : Callable[[np.ndarray, np.ndarray, np.ndarray], float] = hsic, conditional_test : Callable[[np.ndarray, np.ndarray, np.ndarray], float] = kci, significant_level: float = 0.05): ... do PC stuff ... if nodeC is None: p_value = independence_test(data[nodeA], data[nodeB]) else: p_value = conditional_test(data[nodeA], data[nodeB], data[nodeC]) reject = p_value < significant_level .. do PC stuff ...

If you want to configure the tests further, you can do simply:

my_pc_algo(data, partial(hsic, my_hsic_parameter_a = 0.1, kernel = 'laplace'))

Ok, just an idea to emphasize that we don't always need class objects for everything :)

Btw. generally I like the idea to move the independence tests into a separate package. I think they can also be very helpful outside of the 'causality' context and are often required core functions in several algorithms (e.g. graph validation).

Yeah agreed the CI testing is perhaps something to discuss next time then?

Re class vs no class: I started out w/ functional too, but I realized I needed to pass in a large number of parameters to the structure learning algorithm. So it became really clunky. That was prolly my main motivation, but I agree (rn) that there's not much diff, except ease of reading and typing out. I'm down to switch back to function if that's desired.

E.g.

pc = PC( ... PC kwargs ... ci_test=kci, ... ci_test_kwargs

I think rn the PC algorithm as I have it has 7 parameters. If we also pass the ci test parameters, say for KCI, which has probably ~9 parameters. The PC algorithm overall has 16+ parameters possibly. I think eventually, it might be useful

Yea, but we will have default parameters typically and then, in practice, only a fraction of users would really change more than 1 parameter overall :) In the 'average' use case, a user would not even change anything and sticks to the default ones. Btw. another way is also to create a config object that can be passed (just some ideas :)).

Okay that makes sense. Ill refactor CI stuff then to just functions. And maybe eventually just a sep repo altogether.

Are you ok w/ making that into a separate issue/PR?

Yea sure. You can also keep the class structure for now and we can discuss this with the others first. I guess both (having it as class or as function) have their advantages and disadvantages. Maybe something for the next meeting?

bloebp · 2022-08-11T16:29:59Z

dodiscover/constraint/pcalg.py

+logger = logging.getLogger()
+
+
+class PC(BaseConstraintDiscovery):


Wouldn't it make more sense to follow-up on the effort to bring the 'causal-learn' package into PyWhy instead of re-implementing the functionalities?

These were refactored from my package, causal-networkx. From my exp. the causal-learn package was very hard to refactor into a working extensible version. Moreover, the package lacked unit-tests that assert behavior, so I would be wary of copy-pasting here directly w/o extensive testing.

Is there any reason this approach is not desired?

WDYT?

Just as background: Not trying to be too opinionated btw. The package is great for doing some initial applications on the problem I was working on, but I was also working on extending structure learning and researching modifications and it was very difficult for me to do so. Hence I re-implemented the abstractions from scratch with insight from their package of course. I'm in favor of bringing the functionality they implement into a refactored API form that Robert proposed, otw we induce large amounts of technical debt.

I was more thinking along the line that we plan to integrate the causal-learn package in the future anyway (at least, we are talking to the authors of the package) and then it might be better to rather try to refactor their package and make it compatible (rather then re-implementing methods that already exist in many other packages).

Also keep in mind that we can not simply copy over code and refactor it, that would violate the license :)

I see! Makes sense if they are on board to try. I also took inspiration from their code as well. This implementation was to address the action item we discussed last time in the meeting of getting an end-to-end PC example using the graphs to be PRed into networkx.

It was difficult for me to parse https://github.com/cmu-phil/causal-learn/blob/main/causallearn/search/ConstraintBased/PC.py to see if we're missing anything in the core pc alg, so lmk. I think the PR's implementation follows the abstraction of the PC algorithm, so it should be very easy to add functionality at any stage of the algorithm if we are missing something: e.g. the abstraction has the following steps for all constraint-based algorithms:

initialization

learning a skeleton (this is dedicated to a separate algorithm altogether, since there are many proposals in the literature for improving this process

orienting unshielded colliders

orienting edges

(for example, here it is demonstrated that one can easily implement Conservative/MaxVotePC by overriding one method: the orientation of unshielded colliders -> 80 LOC). The algorithm then works as expected locally (unit tests to be added).

https://github.com/cmu-phil/causal-learn/blob/main/causallearn/search/ConstraintBased/PC.py does show that there is an algorithm missing, which is the mvpc algo for missing-value. I think that involves modification of the skeleton procedure IIUC and then everything else should work as expected.

How does that sound?

Sounds good. Its also nothing crucial, I just want to make sure that you don't have double work if we can leverage other packages :) I think the PC algorithm is a good example to see how the required APIs could look like.

I agree that I'm happy to get this in there now. The causal-learn team did indicate they would change the license but we shouldn't block on them.

I would like to see this PR broken up though (see message below) so we can commit these parts piecemeal?

robertness · 2022-08-11T22:31:54Z

Thanks for this @adam2392 . Can I trouble you to close this PR and break the changes on this branch into multiple PR's? The development features can just go into one PR that we can merge immediately.

For the PC algorithm, it would be nice if we could break that up into a few chunks as well. Say for example, the conditional independence tests can be their own PR. This would make for a more readable commit history. Also @darthtrevino are going to start working on dodiscover code and I'm going to be working with him, it would be easier for someone new to PC to see this code broken up into chunks

robertness · 2022-08-12T20:01:19Z

Closing this for now, starting off new PRs with 15

adam2392 added 4 commits August 9, 2022 13:01

Adding base protocol and PC algorithm and example

17b810a

Cleanup

9024dce

Progress for constraint base and PC. Now needs tests

858cf34

Signed-off-by: Adam Li <adam2392@gmail.com>

Working unit tests and example for PC algorithm

9cc73cc

Fix gh action

f0a1620

bloebp reviewed Aug 11, 2022

View reviewed changes

This was referenced Aug 12, 2022

[CI] Enable CI and setup all logistics #14

Closed

[CI] Enable CI and setup all logistics #15

Merged

robertness closed this Aug 12, 2022

This was referenced Aug 16, 2022

[ENH] CI tests for constraint-based structure learning #16

Merged

[ENH] Add PC algorithm #18

Closed

adam2392 deleted the pc branch January 3, 2023 04:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Adding base protocol and PC algorithm and example #11

[ENH] Adding base protocol and PC algorithm and example #11

adam2392 commented Aug 9, 2022 •

edited

Loading

adam2392 commented Aug 11, 2022 •

edited

Loading

adam2392 commented Aug 11, 2022

bloebp Aug 11, 2022

adam2392 Aug 11, 2022

bloebp Aug 11, 2022

bloebp Aug 11, 2022

adam2392 Aug 11, 2022

adam2392 Aug 11, 2022

bloebp Aug 11, 2022

adam2392 Aug 11, 2022

adam2392 Aug 11, 2022

bloebp Aug 11, 2022

bloebp Aug 11, 2022

adam2392 Aug 11, 2022 •

edited

Loading

bloebp Aug 11, 2022

adam2392 Aug 11, 2022

bloebp Aug 11, 2022

robertness Aug 11, 2022

robertness commented Aug 11, 2022

robertness commented Aug 12, 2022 •

edited

Loading

		from .base import BaseConditionalIndependenceTest


		class KernelCITest(BaseConditionalIndependenceTest):

		logger = logging.getLogger()


		class PC(BaseConstraintDiscovery):

[ENH] Adding base protocol and PC algorithm and example #11

[ENH] Adding base protocol and PC algorithm and example #11

Conversation

adam2392 commented Aug 9, 2022 • edited Loading

Summary

What's next?

Misc.

adam2392 commented Aug 11, 2022 • edited Loading

adam2392 commented Aug 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adam2392 Aug 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertness commented Aug 11, 2022

robertness commented Aug 12, 2022 • edited Loading

adam2392 commented Aug 9, 2022 •

edited

Loading

adam2392 commented Aug 11, 2022 •

edited

Loading

adam2392 Aug 11, 2022 •

edited

Loading

robertness commented Aug 12, 2022 •

edited

Loading