Merge branch 'master' of https://github.com/jmschrei/apricot

jmschrei · Feb 28, 2020 · 08ebb13 · 08ebb13
2 parents f1d034b + 965dc1f
commit 08ebb13
Show file tree

Hide file tree

Showing 7 changed files with 159 additions and 4 deletions.
diff --git a/docs/features/sparse.rst b/docs/features/sparse.rst
@@ -0,0 +1,34 @@
+.. _features.sparse
+
+Sparse Matrices
+===============
+
+There is built-in support for sparse matrices for both feature-based and graph-based functions. Using a sparse matrix will always (except when ties exist) give the same result that using a dense matrix populated with 0s would, but it can be significantly faster to explicitly use a sparse matrix representation. For feature-based functions, missing values are assumed to be feature values of 0 and thus do not increase the gain of an example. For graph-based functions, missing values are assumed to be similarities of 0 between examples. It is infrequent in reality to see a similarity of exactly 0 when the similarities are derived from feature values, so this is most likely the case when one is using a pre-defined similarity matrix or an approximation of the dense similarity matrix.
+
+Here is an example of using a feature-based function on a very sparse matrix.
+
+.. code::python
+
+	from apricot import FeatureBasedSelection
+	from scipy.sparse import csr_matrix
+
+	X = numpy.random.randint(2, size=(10000, 100), p=[0.99, 0.01])
+	X_sparse = csr_matrix(X)
+
+	selector = FeatureBasedSelection(100, 'sqrt')
+	selector.fit(X_sparse)
+
+Here is an example of using a graph-based function on a very sparse matrix.
+
+..code::python
+
+	from apricot import FacilityLocationSelection
+	from scipy.sparse import csr_matrix
+
+	X = <a dense matrix that has many 0s in it>
+	X_sparse = csr_matrix(X)
+
+	selector = FacilityLocationSelection(100, 'precomputed')
+	selector.fit(X_sparse)
+
+
diff --git a/docs/optimizers/approx-lazy.rst b/docs/optimizers/approx-lazy.rst
@@ -0,0 +1,23 @@
+.. _optimizers.approx-lazy
+
+Approximate Lazy Greedy
+=======================
+
+The approximate lazy greedy algorithm is a simple extension of the lazy greedy algorithm that, rather than requiring that an element remains at the top of the priority queue after being re-evaluated, only requires that the gain is within a certain user-defined percentage of the best gain to be selected. The key point in this approach is that finding the very best element while maintaining the priority queue may be expensive, but finding elements that are good enough is simple. While the best percentage to use is data set specific, even values near 1 can lead to large savings in computation.
+
+.. code::python
+
+	from apricot import FeatureBasedSelection
+
+	X = numpy.random.randint(10, size=(10000, 100))
+
+	selector = FeatureBasedSelection(100, 'sqrt', optimizer='approximate-lazy')
+	selector.fit(X)
+
+
+API Reference
+-------------
+
+.. automodule:: apricot.optimizers.ApproximateLazyGreedy
+	:members:
+	:inherited-members:
diff --git a/docs/optimizers/bidirectional.rst b/docs/optimizers/bidirectional.rst
@@ -0,0 +1,25 @@
+.. _optimizers.bidirectional
+
+Bidirectional Greedy
+====================
+
+Most submodular optimizers assume that the function is *monotone*, i.e., that the gain from each successive example is positive. However, there are some cases where the key diminishing returns property holds, but the gains are not necessarily positive. In these cases, the naive greedy algorithm is not guaranteed to return a good result. 
+
+The bidirectional greedy algorithm was developed to optimize non-monotone submodular functions. While it has a guarantee that is lower than the naive greedy algorithm has for monotone functions, it generally returns better sets than the greedy algorithm.
+
+.. code::python
+
+	from apricot import FeatureBasedSelection
+
+	X = numpy.random.randint(10, size=(10000, 100))
+
+	selector = FeatureBasedSelection(100, 'sqrt', optimizer='bidirectional')
+	selector.fit(X)
+
+
+API Reference
+-------------
+
+.. automodule:: apricot.optimizers.BidirectionalGreedy
+	:members:
+	:inherited-members:
diff --git a/docs/optimizers/greedi.rst b/docs/optimizers/greedi.rst
@@ -0,0 +1,25 @@
+.. _optimizers.greedi
+
+GreeDi
+======
+
+GreeDi is an optimizer that was designed to work on data sets that are too large to fit into memory. The approach involves first partitioning the data into :math:`m` equally sized chunks without any overlap. Then, :math:`l` elements are selected from each chunk using a standard optimizer like naive or lazy greedy. Finally, these :math:`ml` examples are merged and a standard optimizer selects :math:`k` examples from this set. In this manner, the algorithm sacrifices exactness to ensure that memory limitations are not an issue. 
+
+There are a few considerations to keep in mind when using GreeDi. Naturally, :math:`ml` must both be larger than :math:`k` and also small enough to fit into memory. The larger :math:`l`, the closer the solution is to the exact solution but also the more compute time is required. Conversely, the larger :math:`m` is, the less exact the solution is. When using a graph-based function, increasing :math:`m` can dramatically reduce the amount of computation that needs to be performed, as well as the memory requirements, because the similarity matrix becomes smaller in size. However, feature-based functions are likely to see less of a speed improvement because the cost of evaluating an example is independent of the size of ground set.
+
+.. code::python
+
+	from apricot import FeatureBasedSelection
+
+	X = numpy.random.randint(10, size=(10000, 100))
+
+	selector = FeatureBasedSelection(100, 'sqrt', optimizer='greedi')
+	selector.fit(X)
+
+
+API Reference
+-------------
+
+.. automodule:: apricot.optimizers.GreeDi
+	:members:
+	:inherited-members:
diff --git a/docs/optimizers/modular.rst b/docs/optimizers/modular.rst
@@ -0,0 +1,23 @@
+.. _optimizers.modular
+
+Modular Greedy
+==============
+
+The modular greedy optimizer uses the modular upper-bounds for the gain of each example to do selection. Essentially, a defining characteristic of submodular functions is the *diminishing returns* property where the gain of an example decreases with the number of selected examples. In contrast, modular functions have constant gains for examples regardless of the number of selected examples. Thus, approximating the submodular function as a modular function can serve as an upper-bound to the gain for each example during the selection process. This approximation makes the function simple to optimize because one would simply calculate the gain that each example yields before any examples are selected, sort the examples by this gain, and select the top :math:`k` examples. While this approach is fast, this approach is likely best paired with a traditional optimization algorithm after the first few examples are selected.
+
+.. code::python
+
+	from apricot import FeatureBasedSelection
+
+	X = numpy.random.randint(10, size=(10000, 100))
+
+	selector = FeatureBasedSelection(100, 'sqrt', optimizer='modular')
+	selector.fit(X)
+
+
+API Reference
+-------------
+
+.. automodule:: apricot.optimizers.ModularGreedy
+	:members:
+	:inherited-members:
diff --git a/docs/optimizers/stochastic.rst b/docs/optimizers/stochastic.rst
@@ -1,23 +1,23 @@
 .. _optimizers.stochastic
 
 Stochastic Greedy
-=============
+=================
 
-The stochastic greedy algorithm is a simple approach that subsamples the full data set with a user-defined sampling probability and then runs an optimization on that subset. This subsampling can lead to obvious speed improvements because fewer elements as selected, but will generally find a lower quality subset because fewer elements are present. This approach is typically used a baseline for other approaches but can save a lot of time on massive data sets that are known to be highly redundant.
+The stochastic greedy algorithm is a simple approach that, for each iteration, randomly selects a subset of data and then finds the best next example within that subset. The distinction between this approach and the sample greedy algorithm is that this subset changes at each iteration, meaning that the algorithm does cover the entire data set. In contrast, the sample greedy algorithm is equivalent to manually subsampling the data before running a selector on it. The size of this subset is proportional to the number of examples that are chosen and determined in a manner that results in the same amount of computation being done no matter how many elements are selected. A key idea from this approach is that, while the exact ranking of the elements may differ from the naive/lazy greedy approaches, the set of selected elements is likely to be similar despite the limited amount of computation.
 
 .. code::python
 
 	from apricot import FeatureBasedSelection
 
 	X = numpy.random.randint(10, size=(10000, 100))
 
-	selector = FeatureBasedSelection(100, 'sqrt', optimizer='sample')
+	selector = FeatureBasedSelection(100, 'sqrt', optimizer='stochastic')
 	selector.fit(X)
 
 
 API Reference
 -------------
 
-.. automodule:: apricot.optimizers.SampleGreedy
+.. automodule:: apricot.optimizers.StochasticGreedy
 	:members:
 	:inherited-members:
diff --git a/docs/optimizers/two-stage.rst b/docs/optimizers/two-stage.rst
@@ -0,0 +1,25 @@
+.. _optimizers.two-stage
+
+Two-Stage Greedy
+================
+
+The two-stage greedy optimizer is a general purpose framework for combining two optimizers by making the first :math:`n` selections out of :math:`k` total selections using one optimizer, and then making the remainder using the other. When the first optimizer is random selection and the second approach is naive/lazy greedy, this becomes partial enumeration. By default, the first algorithm is the naive greedy optimizer and the second algorithm is the lazy greedy. This combination results in the same selection as either optimizer individually but replaces the computationally intensive first few steps for the priority queue, where the algorithm may require scanning through almost the entire queue, with the parallelizable naive greedy algorithm. While, in theory, the lazy greedy algorithm will never perform more function calls than the naive greedy algorithm, there are costs associated both with maintaining a priority queue and with evaluating a single example instead of a batch of examples.
+
+This optimizer, with the naive greedy optimizer first and the lazy greedy optimizer second, is the default optimizer for apricot selectors.
+
+.. code::python
+
+	from apricot import FeatureBasedSelection
+
+	X = numpy.random.randint(10, size=(10000, 100))
+
+	selector = FeatureBasedSelection(100, 'sqrt', optimizer='two-stage')
+	selector.fit(X)
+
+
+API Reference
+-------------
+
+.. automodule:: apricot.optimizers.TwoStageGreedy
+	:members:
+	:inherited-members: