Skip to content

Commit

Permalink
Minor changes to code style and documentation (#10)
Browse files Browse the repository at this point in the history
  • Loading branch information
J535D165 authored Oct 1, 2020
1 parent 6a90fce commit 40fd7d7
Show file tree
Hide file tree
Showing 8 changed files with 177 additions and 87 deletions.
78 changes: 42 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@

![Deploy and release](https://github.com/asreview/asreview-visualization/workflows/Deploy%20and%20release/badge.svg)![Build status](https://github.com/asreview/asreview-visualization/workflows/test-suite/badge.svg)

This is a plotting/visualization supplemental package for the
[ASReview](https://github.com/asreview/asreview)
software. It is a fast way to create a visual impression of the ASReview with different
dataset, models and model parameters.
This is a plotting/visualization supplemental package for the
[ASReview](https://github.com/asreview/asreview) software. It is a fast way to
create a visual impression of the ASReview with different datasets, models and
model parameters.

## Installation

The easiest way to install the visualization package is to use the command line:
The easiest way to install the visualization package is to install from PyPI:

``` bash
pip install asreview-visualization
```

After installation of the visualization package, asreview should automatically detect it.
Test this by:
After installation of the visualization package, `asreview` should automatically
detect it. Test this by:

```bash
asreview --help
Expand All @@ -26,12 +26,13 @@ It should list the 'plot' modus.

## Basic usage

State files that were created with the same ASReview settings can be put together/averaged by putting
them in the same directory. State files with different settings/datasets should be put in different
directories to compare them.
State files that were created with the same ASReview settings can be put
together/averaged by putting them in the same directory. State files with
different settings/datasets should be put in different directories to compare
them.

As an example consider the following directory structure, where we have two datasets, called `ace` and
`ptsd`, each of which have 8 runs:
As an example consider the following directory structure, where we have two
datasets, called `ace` and `ptsd`, each of which have 8 runs:

```
├── ace
Expand Down Expand Up @@ -70,29 +71,29 @@ asreview plot ace ptsd --absolute-values

## Plot types

There are currently four plot types implemented:
_inclusion_, _discovery_, _limit_, _progression_.
They can be individually selected with the `-t` or `--type` switch. Multiple plots
can be made by using `,` as a separator:
There are currently four plot types implemented: _inclusion_, _discovery_,
_limit_, _progression_. They can be individually selected with the `-t` or
`--type` switch. Multiple plots can be made by using `,` as a separator:

```bash
asreview plot ace ptsd --type 'inclusion,discovery'
```

### Inclusion

This figure shows the number/percentage of included papers found as a function of the
number/percentage of papers reviewed. Initial included/excluded papers are subtracted so that the line
always starts at (0,0).
This figure shows the number/percentage of included papers found as a function
of the number/percentage of papers reviewed. Initial included/excluded papers
are subtracted so that the line always starts at (0,0).

The quicker the line goes to a 100%, the better the performance.

![alt text](https://github.com/msdslab/asreview-visualization/blob/master/docs/inclusions.png?raw=true "Inclusions")

### Discovery

This figure shows the distribution of the number of papers that have to be read before discovering
each inclusion. Not every paper is equally hard to find.
This figure shows the distribution of the number of papers that have to be
read before discovering each inclusion. Not every paper is equally hard to
find.

The closer to the left, the better.

Expand All @@ -101,34 +102,39 @@ The closer to the left, the better.

### Limit

This figure shows how many papers need to be read with a given criterion. A criterion is expressed
as "after reading _y_ % of the papers, at most an average of _z_ included papers have been not been
seen by the reviewer, if he is using max sampling.". Here, _y_ is shown on the y-axis, while
three values of _z_ are plotted as three different lines with the same color. The three values for
_z_ are 0.1, 0.5 and 2.0.
This figure shows how many papers need to be read with a given criterion. A
criterion is expressed as "after reading _y_ % of the papers, at most an
average of _z_ included papers have been not been seen by the reviewer, if he
is using max sampling.". Here, _y_ is shown on the y-axis, while three values
of _z_ are plotted as three different lines with the same color. The three
values for _z_ are 0.1, 0.5 and 2.0.

The quicker the lines touch the black (`y=x`) line, the better.

![alt text](https://github.com/msdslab/asreview-visualization/blob/master/docs/limits.png?raw=true "Limits")

### Progression

This figure shows the average inclusion rate as a function of time, number of papers read.
The more concentrated on the left, the better. The thick line is the average of individual runs
(thin lines). The visualization package will automatically detect which are directories and which
are files. The curve is smoothed out by using a Gaussian smoothing algorithm.
This figure shows the average inclusion rate as a function of time, number of
papers read. The more concentrated on the left, the better. The thick line is
the average of individual runs (thin lines). The visualization package will
automatically detect which are directories and which are files. The curve is
smoothed out by using a Gaussian smoothing algorithm.

![alt text](https://github.com/msdslab/asreview-visualization/blob/master/docs/progression.png?raw=true "Progression")


## API

To make use of the more advanced features, you can also use the visualization package
as a library. The advantage is that you can make more reproducible plots where text, etc. is
in the place *you* want it. Examples can be found in module `asreviewcontrib.visualization.quick`.
Those are the scripts that are used for the command line interface.
To make use of the more advanced features, you can also use the visualization
package as a library. The advantage is that you can make more reproducible
plots where text, etc. is in the place *you* want it. Examples can be found in
module `asreviewcontrib.visualization.quick`. Those are the scripts that are
used for the command line interface.

```python
from asreviewcontrib.visualization.plot import Plot

with Plot.from_paths(["PATH_1", "PATH_2"]) as plot:
inc_plot = plot.new("inclusion")
inc_plot.set_grid()
Expand All @@ -141,5 +147,5 @@ with Plot.from_paths(["PATH_1", "PATH_2"]) as plot:

Of course fill in `PATH_1` and `PATH_2` as the files you would like to plot.

If the customization is not sufficient, you can also directly manipulate the `self.ax` and
`self.fig` attributes of the plotting class.
If the customization is not sufficient, you can also directly manipulate the
`self.ax` and `self.fig` attributes of the plotting class.
8 changes: 4 additions & 4 deletions asreviewcontrib/visualization/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from collections import OrderedDict
import os
from collections import OrderedDict

from asreview.analysis.analysis import Analysis
from asreview.analysis import Analysis

from asreviewcontrib.visualization.plot_inclusions import PlotInclusions
from asreviewcontrib.visualization.plot_progression import PlotProgression
from asreviewcontrib.visualization.plot_discovery import PlotDiscovery
from asreviewcontrib.visualization.plot_inclusions import PlotInclusions
from asreviewcontrib.visualization.plot_limit import PlotLimit
from asreviewcontrib.visualization.plot_progression import PlotProgression


class Plot():
Expand Down
17 changes: 16 additions & 1 deletion asreviewcontrib/visualization/plot_base.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,24 @@
# Copyright 2020 The ASReview Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import matplotlib.pyplot as plt


class PlotBase():
def __init__(self, analyses):
"""
"""Base class for plots.
Plot the number of queries that turned out to be included
in the final review.
"""
Expand Down
17 changes: 13 additions & 4 deletions asreviewcontrib/visualization/plot_discovery.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

class PlotDiscovery(PlotBase):
def __init__(self, analyses, result_format="percentage"):
"""Class for the Discovery plot."""
super(PlotDiscovery, self).__init__(analyses)
self.result_format = result_format

Expand All @@ -13,13 +14,21 @@ def __init__(self, analyses, result_format="percentage"):
avg_times.append(list(results.values()))

if result_format == "number":
self.ax.hist(avg_times, 30, histtype='bar', density=False,
label=self.analyses.keys())
self.ax.hist(
avg_times,
30,
histtype='bar',
density=False,
label=self.analyses.keys())
self.ax.set_xlabel("# Reviewed")
self.ax.set_ylabel("# of papers included")
else:
self.ax.hist(avg_times, 30, histtype='bar', density=True,
label=self.analyses.keys())
self.ax.hist(
avg_times,
30,
histtype='bar',
density=True,
label=self.analyses.keys())
self.ax.set_xlabel("% Reviewed")
self.ax.set_ylabel("Fraction of papers included")

Expand Down
73 changes: 52 additions & 21 deletions asreviewcontrib/visualization/plot_inclusions.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
import warnings

import numpy as np

from asreviewcontrib.visualization.plot_base import PlotBase


class PlotInclusions(PlotBase):
def __init__(self, analyses, result_format="percentage", thick=None):
"""
"""Class for the Inclusions plot.
Plot the number of queries that turned out to be included
in the final review.
"""
Expand All @@ -31,7 +34,7 @@ def __init__(self, analyses, result_format="percentage", thick=None):
n_after_init = len(analysis.labels) - n_initial
max_len = max(max_len, n_after_init)

self.col[data_key] = "C"+str((len(self.analyses)-1-i) % 10)
self.col[data_key] = "C" + str((len(self.analyses) - 1 - i) % 10)
col = self.col[data_key]

if self.thick[data_key]:
Expand All @@ -56,54 +59,82 @@ def __init__(self, analyses, result_format="percentage", thick=None):
self.ax.set_ylabel("% Inclusions found")
self.fig.tight_layout()

def add_WSS(self, data_key, value=95, text_at=None, add_value=False,
alpha=0.8, text_col="white", add_text=True, **kwargs):
def add_WSS(self, *args, **kwargs): # noqa
warnings.warn(
"add_WSS is deprecated, use add_wss instead",
DeprecationWarning
)
self.add_wss(*args, **kwargs)

def add_wss(self,
data_key,
value=95,
text_at=None,
add_value=False,
alpha=0.8,
text_col="white",
add_text=True,
**kwargs):
analysis = self.analyses[data_key]
col = self.col[data_key]

if value is None:
return

text = f"WSS@{value}%"
WSS_val, WSS_x, WSS_y = analysis.wss(
wss_val, wss_x, wss_y = analysis.wss(
value, x_format=self.result_format, **kwargs)
if WSS_x is None or WSS_y is None:
if wss_x is None or wss_y is None:
return

if add_value:
text += r"$\approx" + f" {round(WSS_val, 2)}" + r"\%$"
text += r"$\approx" + f" {round(wss_val, 2)}" + r"\%$"

if text_at is None:
text_at = (WSS_x[0] + self.box_dist, (WSS_y[0] + WSS_y[1])/2)
text_at = (wss_x[0] + self.box_dist, (wss_y[0] + wss_y[1]) / 2)

self.ax.plot(WSS_x, WSS_y, color=col, ls="--")
self.ax.plot(WSS_x, (0, WSS_y[0]), color=col, ls=":")
self.ax.plot(wss_x, wss_y, color=col, ls="--")
self.ax.plot(wss_x, (0, wss_y[0]), color=col, ls=":")
bbox = dict(boxstyle='round', facecolor=col, alpha=alpha)
if add_text:
self.ax.text(*text_at, text, color=text_col, bbox=bbox)

def add_RRF(self, data_key, value=10, text_at=None, add_value=False,
alpha=0.8, text_col="white", add_text=True, **kwargs):
def add_RRF(self, *args, **kwargs): # noqa
warnings.warn(
"add_RRF is deprecated, use add_rrf instead",
DeprecationWarning
)
self.add_rrf(*args, **kwargs)

def add_rrf(self,
data_key,
value=10,
text_at=None,
add_value=False,
alpha=0.8,
text_col="white",
add_text=True,
**kwargs):
analysis = self.analyses[data_key]
col = self.col[data_key]
if value is None:
return

RRF_val, RRF_x, RRF_y = analysis.rrf(
rrf_val, rrf_x, rrf_y = analysis.rrf(
value, x_format=self.result_format, **kwargs)
if RRF_x is None or RRF_y is None:
if rrf_x is None or rrf_y is None:
return

text = f"RRF@{value}%"
if add_value:
text += r"$\approx" + f" {round(RRF_val, 2)}" + r"\%$"
text += r"$\approx" + f" {round(rrf_val, 2)}" + r"\%$"

RRF_x = 0, RRF_x[0]
RRF_y = RRF_y[1], RRF_y[1]
rrf_x = 0, rrf_x[0]
rrf_y = rrf_y[1], rrf_y[1]
if text_at is None:
text_at = (RRF_x[0] + self.box_dist, RRF_y[0] + self.box_dist + 2)
text_at = (rrf_x[0] + self.box_dist, rrf_y[0] + self.box_dist + 2)

self.ax.plot(RRF_x, RRF_y, color=col, ls="--")
self.ax.plot(rrf_x, rrf_y, color=col, ls="--")
bbox = dict(boxstyle='round', facecolor=col, alpha=alpha)
if add_text:
self.ax.text(*text_at, text, color=text_col, bbox=bbox)
Expand All @@ -115,8 +146,8 @@ def add_random(self, text_at=None, col='black', add_text=True):
xlim = self.ax.get_xlim()
ylim = self.ax.get_ylim()
text_at = (
np.average(xlim) - 0.07 * (xlim[1]-xlim[0]),
np.average(ylim) + 0.07 * (ylim[1]-ylim[0]),
np.average(xlim) - 0.07 * (xlim[1] - xlim[0]),
np.average(ylim) + 0.07 * (ylim[1] - ylim[0]),
)

bbox = dict(boxstyle='round', facecolor='0.65')
Expand Down
16 changes: 10 additions & 6 deletions asreviewcontrib/visualization/plot_limit.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@


class PlotLimit(PlotBase):
def __init__(self, analyses, prob_allow_miss=[0.1, 0.5, 2.0],
def __init__(self,
analyses,
prob_allow_miss=[0.1, 0.5, 2.0],
result_format="percentage"):
"""Class for the Limit plot."""
super(PlotLimit, self).__init__(analyses)

self.legend_plt = []
Expand All @@ -15,16 +18,17 @@ def __init__(self, analyses, prob_allow_miss=[0.1, 0.5, 2.0],

for i, data_key in enumerate(self.analyses):
res = self.analyses[data_key].limits(
prob_allow_miss=prob_allow_miss,
result_format=result_format)
prob_allow_miss=prob_allow_miss, result_format=result_format)
x_range = res["x_range"]
col = "C"+str(i % 10)
col = "C" + str(i % 10)

for i_limit, limit in enumerate(res["limits"]):
ls = linestyles[i_limit % len(linestyles)]
my_plot, = self.ax.plot(
x_range, np.array(limit)+np.array(x_range),
color=col, ls=ls)
x_range,
np.array(limit) + np.array(x_range),
color=col,
ls=ls)
if i_limit == 0:
self.legend_plt.append(my_plot)
self.legend_name.append(f"{data_key}")
Expand Down
Loading

0 comments on commit 40fd7d7

Please sign in to comment.