Minor changes to code style and documentation (#10)

asreview · Oct 1, 2020 · 40fd7d7 · 40fd7d7
1 parent 6a90fce
commit 40fd7d7
Show file tree

Hide file tree

Showing 8 changed files with 177 additions and 87 deletions.
diff --git a/README.md b/README.md
@@ -2,21 +2,21 @@
 
 ![Deploy and release](https://github.com/asreview/asreview-visualization/workflows/Deploy%20and%20release/badge.svg)![Build status](https://github.com/asreview/asreview-visualization/workflows/test-suite/badge.svg)
 
-This is a plotting/visualization supplemental package for the 
-[ASReview](https://github.com/asreview/asreview)
-software. It is a fast way to create a visual impression of the ASReview with different
-dataset, models and model parameters.
+This is a plotting/visualization supplemental package for the
+[ASReview](https://github.com/asreview/asreview) software. It is a fast way to
+create a visual impression of the ASReview with different datasets, models and
+model parameters.
 
 ## Installation
 
-The easiest way to install the visualization package is to use the command line:
+The easiest way to install the visualization package is to install from PyPI:
 
 ``` bash
 pip install asreview-visualization
 ```
 
-After installation of the visualization package, asreview should automatically detect it.
-Test this by:
+After installation of the visualization package, `asreview` should automatically
+detect it. Test this by:
 
 ```bash
 asreview --help
@@ -26,12 +26,13 @@ It should list the 'plot' modus.
 
 ## Basic usage
 
-State files that were created with the same ASReview settings can be put together/averaged by putting
-them in the same directory. State files with different settings/datasets should be put in different 
-directories to compare them.
+State files that were created with the same ASReview settings can be put
+together/averaged by putting them in the same directory. State files with
+different settings/datasets should be put in different directories to compare
+them.
 
-As an example consider the following directory structure, where we have two datasets, called `ace` and
-`ptsd`, each of which have 8 runs:
+As an example consider the following directory structure, where we have two
+datasets, called `ace` and `ptsd`, each of which have 8 runs:
 
 ```
 ├── ace
@@ -70,29 +71,29 @@ asreview plot ace ptsd --absolute-values
 
 ## Plot types
 
-There are currently four plot types implemented:
-_inclusion_, _discovery_, _limit_, _progression_.
-They can be individually selected with the `-t` or `--type` switch. Multiple plots
-can be made by using `,` as a separator:
+There are currently four plot types implemented: _inclusion_, _discovery_,
+_limit_, _progression_. They can be individually selected with the `-t` or
+`--type` switch. Multiple plots can be made by using `,` as a separator:
 
 ```bash
 asreview plot ace ptsd --type 'inclusion,discovery'
 ```
 
 ### Inclusion
 
-This figure shows the number/percentage of included papers found as a function of the
-number/percentage of papers reviewed. Initial included/excluded papers are subtracted so that the line
-always starts at (0,0).
+This figure shows the number/percentage of included papers found as a function
+of the number/percentage of papers reviewed. Initial included/excluded papers
+are subtracted so that the line always starts at (0,0).
 
 The quicker the line goes to a 100%, the better the performance.
 
 ![alt text](https://github.com/msdslab/asreview-visualization/blob/master/docs/inclusions.png?raw=true "Inclusions")
 
 ### Discovery
 
-This figure shows the distribution of the number of papers that have to be read before discovering
-each inclusion. Not every paper is equally hard to find.
+This figure shows the distribution of the number of papers that have to be
+read before discovering each inclusion. Not every paper is equally hard to
+find.
 
 The closer to the left, the better.
 
@@ -101,34 +102,39 @@ The closer to the left, the better.
 
 ### Limit
 
-This figure shows how many papers need to be read with a given criterion. A criterion is expressed
-as "after reading _y_ % of the papers, at most an average of _z_ included papers have been not been
-seen by the reviewer, if he is using max sampling.". Here, _y_ is shown on the y-axis, while
-three values of _z_ are plotted as three different lines with the same color. The three values for
-_z_ are 0.1, 0.5 and 2.0.
+This figure shows how many papers need to be read with a given criterion. A
+criterion is expressed as "after reading _y_ % of the papers, at most an
+average of _z_ included papers have been not been seen by the reviewer, if he
+is using max sampling.". Here, _y_ is shown on the y-axis, while three values
+of _z_ are plotted as three different lines with the same color. The three
+values for _z_ are 0.1, 0.5 and 2.0.
 
 The quicker the lines touch the black (`y=x`) line, the better.
 
 ![alt text](https://github.com/msdslab/asreview-visualization/blob/master/docs/limits.png?raw=true "Limits")
 
 ### Progression
 
-This figure shows the average inclusion rate as a function of time, number of papers read.
-The more concentrated on the left, the better. The thick line is the average of individual runs
-(thin lines). The visualization package will automatically detect which are directories and which
-are files. The curve is smoothed out by using a Gaussian smoothing algorithm.
+This figure shows the average inclusion rate as a function of time, number of
+papers read. The more concentrated on the left, the better. The thick line is
+the average of individual runs (thin lines). The visualization package will
+automatically detect which are directories and which are files. The curve is
+smoothed out by using a Gaussian smoothing algorithm.
 
 ![alt text](https://github.com/msdslab/asreview-visualization/blob/master/docs/progression.png?raw=true "Progression")
 
 
 ## API
 
-To make use of the more advanced features, you can also use the visualization package
-as a library. The advantage is that you can make more reproducible plots where text, etc. is
-in the place *you* want it. Examples can be found in module `asreviewcontrib.visualization.quick`.
-Those are the scripts that are used for the command line interface.
+To make use of the more advanced features, you can also use the visualization
+package as a library. The advantage is that you can make more reproducible
+plots where text, etc. is in the place *you* want it. Examples can be found in
+module `asreviewcontrib.visualization.quick`. Those are the scripts that are
+used for the command line interface.
 
 ```python
+from asreviewcontrib.visualization.plot import Plot
+
 with Plot.from_paths(["PATH_1", "PATH_2"]) as plot:
 	inc_plot = plot.new("inclusion")
 	inc_plot.set_grid()
@@ -141,5 +147,5 @@ with Plot.from_paths(["PATH_1", "PATH_2"]) as plot:
 
 Of course fill in `PATH_1` and `PATH_2` as the files you would like to plot.
 
-If the customization is not sufficient, you can also directly manipulate the `self.ax` and 
-`self.fig` attributes of the plotting class.
+If the customization is not sufficient, you can also directly manipulate the
+`self.ax` and `self.fig` attributes of the plotting class.
diff --git a/asreviewcontrib/visualization/plot.py b/asreviewcontrib/visualization/plot.py
@@ -12,15 +12,15 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from collections import OrderedDict
 import os
+from collections import OrderedDict
 
-from asreview.analysis.analysis import Analysis
+from asreview.analysis import Analysis
 
-from asreviewcontrib.visualization.plot_inclusions import PlotInclusions
-from asreviewcontrib.visualization.plot_progression import PlotProgression
 from asreviewcontrib.visualization.plot_discovery import PlotDiscovery
+from asreviewcontrib.visualization.plot_inclusions import PlotInclusions
 from asreviewcontrib.visualization.plot_limit import PlotLimit
+from asreviewcontrib.visualization.plot_progression import PlotProgression
 
 
 class Plot():

diff --git a/asreviewcontrib/visualization/plot_base.py b/asreviewcontrib/visualization/plot_base.py
@@ -1,9 +1,24 @@
+# Copyright 2020 The ASReview Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import matplotlib.pyplot as plt
 
 
 class PlotBase():
     def __init__(self, analyses):
-        """
+        """Base class for plots.
+
         Plot the number of queries that turned out to be included
         in the final review.
         """

diff --git a/asreviewcontrib/visualization/plot_discovery.py b/asreviewcontrib/visualization/plot_discovery.py
@@ -3,6 +3,7 @@
 
 class PlotDiscovery(PlotBase):
     def __init__(self, analyses, result_format="percentage"):
+        """Class for the Discovery plot."""
         super(PlotDiscovery, self).__init__(analyses)
         self.result_format = result_format
 
@@ -13,13 +14,21 @@ def __init__(self, analyses, result_format="percentage"):
             avg_times.append(list(results.values()))
 
         if result_format == "number":
-            self.ax.hist(avg_times, 30, histtype='bar', density=False,
-                         label=self.analyses.keys())
+            self.ax.hist(
+                avg_times,
+                30,
+                histtype='bar',
+                density=False,
+                label=self.analyses.keys())
             self.ax.set_xlabel("# Reviewed")
             self.ax.set_ylabel("# of papers included")
         else:
-            self.ax.hist(avg_times, 30, histtype='bar', density=True,
-                         label=self.analyses.keys())
+            self.ax.hist(
+                avg_times,
+                30,
+                histtype='bar',
+                density=True,
+                label=self.analyses.keys())
             self.ax.set_xlabel("% Reviewed")
             self.ax.set_ylabel("Fraction of papers included")
 

diff --git a/asreviewcontrib/visualization/plot_inclusions.py b/asreviewcontrib/visualization/plot_inclusions.py
@@ -1,11 +1,14 @@
+import warnings
+
 import numpy as np
 
 from asreviewcontrib.visualization.plot_base import PlotBase
 
 
 class PlotInclusions(PlotBase):
     def __init__(self, analyses, result_format="percentage", thick=None):
-        """
+        """Class for the Inclusions plot.
+
         Plot the number of queries that turned out to be included
         in the final review.
         """
@@ -31,7 +34,7 @@ def __init__(self, analyses, result_format="percentage", thick=None):
             n_after_init = len(analysis.labels) - n_initial
             max_len = max(max_len, n_after_init)
 
-            self.col[data_key] = "C"+str((len(self.analyses)-1-i) % 10)
+            self.col[data_key] = "C" + str((len(self.analyses) - 1 - i) % 10)
             col = self.col[data_key]
 
             if self.thick[data_key]:
@@ -56,54 +59,82 @@ def __init__(self, analyses, result_format="percentage", thick=None):
             self.ax.set_ylabel("% Inclusions found")
         self.fig.tight_layout()
 
-    def add_WSS(self, data_key, value=95, text_at=None, add_value=False,
-                alpha=0.8, text_col="white", add_text=True, **kwargs):
+    def add_WSS(self, *args, **kwargs):  # noqa
+        warnings.warn(
+            "add_WSS is deprecated, use add_wss instead",
+            DeprecationWarning
+        )
+        self.add_wss(*args, **kwargs)
+
+    def add_wss(self,
+                data_key,
+                value=95,
+                text_at=None,
+                add_value=False,
+                alpha=0.8,
+                text_col="white",
+                add_text=True,
+                **kwargs):
         analysis = self.analyses[data_key]
         col = self.col[data_key]
 
         if value is None:
             return
 
         text = f"WSS@{value}%"
-        WSS_val, WSS_x, WSS_y = analysis.wss(
+        wss_val, wss_x, wss_y = analysis.wss(
             value, x_format=self.result_format, **kwargs)
-        if WSS_x is None or WSS_y is None:
+        if wss_x is None or wss_y is None:
             return
 
         if add_value:
-            text += r"$\approx" + f" {round(WSS_val, 2)}" + r"\%$"
+            text += r"$\approx" + f" {round(wss_val, 2)}" + r"\%$"
 
         if text_at is None:
-            text_at = (WSS_x[0] + self.box_dist, (WSS_y[0] + WSS_y[1])/2)
+            text_at = (wss_x[0] + self.box_dist, (wss_y[0] + wss_y[1]) / 2)
 
-        self.ax.plot(WSS_x, WSS_y, color=col, ls="--")
-        self.ax.plot(WSS_x, (0, WSS_y[0]), color=col, ls=":")
+        self.ax.plot(wss_x, wss_y, color=col, ls="--")
+        self.ax.plot(wss_x, (0, wss_y[0]), color=col, ls=":")
         bbox = dict(boxstyle='round', facecolor=col, alpha=alpha)
         if add_text:
             self.ax.text(*text_at, text, color=text_col, bbox=bbox)
 
-    def add_RRF(self, data_key, value=10, text_at=None, add_value=False,
-                alpha=0.8, text_col="white", add_text=True, **kwargs):
+    def add_RRF(self, *args, **kwargs):  # noqa
+        warnings.warn(
+            "add_RRF is deprecated, use add_rrf instead",
+            DeprecationWarning
+        )
+        self.add_rrf(*args, **kwargs)
+
+    def add_rrf(self,
+                data_key,
+                value=10,
+                text_at=None,
+                add_value=False,
+                alpha=0.8,
+                text_col="white",
+                add_text=True,
+                **kwargs):
         analysis = self.analyses[data_key]
         col = self.col[data_key]
         if value is None:
             return
 
-        RRF_val, RRF_x, RRF_y = analysis.rrf(
+        rrf_val, rrf_x, rrf_y = analysis.rrf(
             value, x_format=self.result_format, **kwargs)
-        if RRF_x is None or RRF_y is None:
+        if rrf_x is None or rrf_y is None:
             return
 
         text = f"RRF@{value}%"
         if add_value:
-            text += r"$\approx" + f" {round(RRF_val, 2)}" + r"\%$"
+            text += r"$\approx" + f" {round(rrf_val, 2)}" + r"\%$"
 
-        RRF_x = 0, RRF_x[0]
-        RRF_y = RRF_y[1], RRF_y[1]
+        rrf_x = 0, rrf_x[0]
+        rrf_y = rrf_y[1], rrf_y[1]
         if text_at is None:
-            text_at = (RRF_x[0] + self.box_dist, RRF_y[0] + self.box_dist + 2)
+            text_at = (rrf_x[0] + self.box_dist, rrf_y[0] + self.box_dist + 2)
 
-        self.ax.plot(RRF_x, RRF_y, color=col, ls="--")
+        self.ax.plot(rrf_x, rrf_y, color=col, ls="--")
         bbox = dict(boxstyle='round', facecolor=col, alpha=alpha)
         if add_text:
             self.ax.text(*text_at, text, color=text_col, bbox=bbox)
@@ -115,8 +146,8 @@ def add_random(self, text_at=None, col='black', add_text=True):
             xlim = self.ax.get_xlim()
             ylim = self.ax.get_ylim()
             text_at = (
-                np.average(xlim) - 0.07 * (xlim[1]-xlim[0]),
-                np.average(ylim) + 0.07 * (ylim[1]-ylim[0]),
+                np.average(xlim) - 0.07 * (xlim[1] - xlim[0]),
+                np.average(ylim) + 0.07 * (ylim[1] - ylim[0]),
             )
 
         bbox = dict(boxstyle='round', facecolor='0.65')

diff --git a/asreviewcontrib/visualization/plot_limit.py b/asreviewcontrib/visualization/plot_limit.py
@@ -4,8 +4,11 @@
 
 
 class PlotLimit(PlotBase):
-    def __init__(self, analyses, prob_allow_miss=[0.1, 0.5, 2.0],
+    def __init__(self,
+                 analyses,
+                 prob_allow_miss=[0.1, 0.5, 2.0],
                  result_format="percentage"):
+        """Class for the Limit plot."""
         super(PlotLimit, self).__init__(analyses)
 
         self.legend_plt = []
@@ -15,16 +18,17 @@ def __init__(self, analyses, prob_allow_miss=[0.1, 0.5, 2.0],
 
         for i, data_key in enumerate(self.analyses):
             res = self.analyses[data_key].limits(
-                prob_allow_miss=prob_allow_miss,
-                result_format=result_format)
+                prob_allow_miss=prob_allow_miss, result_format=result_format)
             x_range = res["x_range"]
-            col = "C"+str(i % 10)
+            col = "C" + str(i % 10)
 
             for i_limit, limit in enumerate(res["limits"]):
                 ls = linestyles[i_limit % len(linestyles)]
                 my_plot, = self.ax.plot(
-                    x_range, np.array(limit)+np.array(x_range),
-                    color=col, ls=ls)
+                    x_range,
+                    np.array(limit) + np.array(x_range),
+                    color=col,
+                    ls=ls)
                 if i_limit == 0:
                     self.legend_plt.append(my_plot)
                     self.legend_name.append(f"{data_key}")