diff --git a/CHANGELOG.md b/CHANGELOG.md index 2881971..826591f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,7 @@ ## Summary * [**Unreleased**](#unreleased) +* [**Release 2022-12-22 v0.0.41**](#release-2022-12-22-v0042) Added summarize_papers script * [**Release 2022-12-06 v0.0.41**](#release-2022-12-06-v0041) setup.py updates for make target, install * [**Release 2022-11-26 v0.0.40**](#release-2022-11-28-v0040) Added pmidcite.scripts.icite; pip3, not pip from Python2 * [**Release 2022-11-26 v0.0.38**](#release-2022-11-26-v0038) Added instructions, and console_script to run script, icite @@ -42,6 +43,11 @@ ### Unreleased +### release 2022-12-30 v0.0.42 +* ADDED summarize_papers script +* ADDED requests package as a pre-requisite +* CHANGED API to NCBI E-utils such that a missing LID (Local ID) is ignored on a PubMed entry + ### release 2022-12-06 v0.0.41 * CHANGED setup.py PACKAGES variable to run install make target diff --git a/README.md b/README.md index 1416195..d6e1122 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# PubMed ID (PMID) Cite +# PubMedj ID (PMID) Cite [](https://twitter.com/intent/tweet?text=Python%20library%20to%20download%20pubmed%20citation%20counts%20and%20data,%20given%20a%20PMID&url=https://github.com/dvklopfenstein/pmidcite&via=dvklopfenstein&hashtags=pubmed,pmid,citations,pubmed2cite,writingtips,scientificwriting) [](https://github.com/dvklopfenstein/pmidcite/actions/workflows/build.yml) @@ -20,6 +20,7 @@ Contact: dvklopfenstein@protonmail.com * [**1) Download citation counts and data for a research paper**](https://github.com/dvklopfenstein/pmidcite#1-download-citation-counts-and-data-for-a-research-paper) * [**2) Forward citation search**](https://github.com/dvklopfenstein/pmidcite#2-forward-citation-search): following a paper's *Cited by* links or *Forward snowballing* * [**3) Backward citation search**](https://github.com/dvklopfenstein/pmidcite#3-backward-citation-search): following the links to a paper's references or *Backward snowballing* +* [**4) Summarize a group of citations**](https://github.com/dvklopfenstein/pmidcite#4-summarize-a-group-of-citations) ## 1) Download citation counts and data for a research paper ```$ icite -H 26032263``` @@ -56,6 +57,57 @@ Also known as following links to a paper's references or *Backward snowballing* or ```$ icite -H; icite 26032263 -r | sort -k6 -r``` +## 4) Summarize a group of citations +* 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`): +`icite -H 30022098` +* 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`): +`icite 30022098 -c -o goatools_cites.txt` +* 4c) Summarize the overall performace of the 300+ citing papers contained in `goatools_cites.txt` +`summarize_papers goatools_cites.txt -p TOP CIT CLI` + +### 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`): +``` +$ icite -H 30022098 +COL 2 3 4 5 6 7 8 9 10 au[11](authors) +TYP PMID RP HAMCc % G YEAR cit cli ref au[00](authors) title +TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses. +``` + +Paper with PMID `30022098` is cited by `318`(`cit`) other research papers and `1`(`cli`) clinical study. It has `23` references(`ref`). + +### 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`): +``` +$ icite 30022098 -c -o goatools_cites.txt +``` + +The requested paper (PMID=`30022098`) is described in one one line in `goatools_cites.txt`: +``` +$ grep TOP goatools_cites.txt +TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses. +``` + +The paper (PMID=`30022098`) is cited by 381(`CIT`) research papers plus 1(`CLI`) clinical study: +``` +$ grep CIT goatools_cites.txt | wc -l +318 + +$ grep CLI goatools_cites.txt | wc -l +1 +``` + +### 4c) Summarize all the papers in `goatools_cites.txt` +**NEW FUNCTIONALITY; INPUT REQUESTED: What would you like to see?** [Open an issue](https://github.com/dvklopfenstein/pmidcite/issues) to comment. +``` +$ summarize_papers goatools_cites.txt -p TOP CIT CLI +i=033.4% 4=003.4% 3=020.9% 2=021.9% 1=015.9% 0=004.4% 4 years:2018-2022 320 papers goatools_cites.txt +``` + +* Output is on one line so many files containing sets of PMIDs may be compared. TBD: Add multiline verbose option. +* The groups are from newest(`i`) to top-performing(`4`), great(`3`), very good(`2`), and overlooked(`1` and `0`) +* The percentages of papers in `goatools_citations.txt` in each group follow the group name + + + # PubMed vs Google Scholar
@@ -456,4 +508,4 @@ Fiorini N ... Lu Zhiyong
dvklopfenstein@protonmail.com
https://orcid.org/0000-0003-0161-7603
-Copyright (C) 2019-present [pmidcite](https://dvklopfenstein.github.io/pmidcite/), DV Klopfenstein. All rights reserved.
+Copyright (C) 2019-present [pmidcite](https://dvklopfenstein.github.io/pmidcite/), DV Klopfenstein, PhD. All rights reserved.
diff --git a/makefile b/makefile
index 4b9b8f7..dbbb089 100644
--- a/makefile
+++ b/makefile
@@ -18,6 +18,9 @@ p:
d:
find src -regextype posix-extended -regex "[a-z./]*" -type d
+cli:
+ find src/pmidcite/cli -name \*.py
+
diff0:
git diff --compact-summary
diff --git a/setup.py b/setup.py
index 6eab562..f579d26 100755
--- a/setup.py
+++ b/setup.py
@@ -10,6 +10,8 @@
from setuptools import setup
# import versioneer
+__copyright__ = 'Copyright (C) 2019, DV Klopfenstein, PhD. All rights reserved'
+__author__ = 'DV Klopfenstein, PhD'
NAME = 'pmidcite'
@@ -42,7 +44,7 @@ def get_long_description():
setup(
name=NAME,
## version=versioneer.get_version(),
- version='0.0.41',
+ version='0.0.42',
author='DV Klopfenstein, PhD',
author_email='dvklopfenstein@protonmail.com',
## cmdclass=versioneer.get_cmdclass(),
@@ -55,6 +57,7 @@ def get_long_description():
entry_points={
'console_scripts':[
'icite=pmidcite.scripts.icite:main',
+ 'summarize_papers=pmidcite.scripts.icite:summarize_papers',
],
},
# https://pypi.org/classifiers/
@@ -68,9 +71,11 @@ def get_long_description():
'Topic :: Scientific/Engineering :: Information Analysis',
],
url='http://github.com/dvklopfenstein/pmidcite',
- description="Augment's a PubMed literature search with citation data from NIH-OCC's iCite.",
+ description="Turbocharge a PubMed literature search using citation data from the NIH",
# https://packaging.python.org/guides/making-a-pypi-friendly-readme/
long_description=get_long_description(),
long_description_content_type='text/markdown',
- # install_requires=['docopt'],
+ install_requires=['requests'],
)
+
+# Copyright (C) 2019, DV Klopfenstein, PhD. All rights reserved
diff --git a/src/bin/icite.py b/src/bin/icite.py
index df17e01..3290e05 100755
--- a/src/bin/icite.py
+++ b/src/bin/icite.py
@@ -1,8 +1,8 @@
#!/usr/bin/env python3
"""Given a PubMed ID (PMID), return a list of citing publications"""
-__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
-__author__ = "DV Klopfenstein"
+__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
from pmidcite.cli.icite import NIHiCiteCli # get_argparser
from pmidcite.cfg import get_cfgparser
@@ -16,4 +16,4 @@ def main():
if __name__ == '__main__':
main()
-# Copyright (C) 2019-present, DV Klopfenstein. All rights reserved.
+# Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/__version__.py b/src/pmidcite/__version__.py
index 4705298..a81b6f2 100644
--- a/src/pmidcite/__version__.py
+++ b/src/pmidcite/__version__.py
@@ -1,3 +1,3 @@
"""Version of pmidcite project"""
-__version__ = '0.0.41'
+__version__ = '0.0.42'
diff --git a/src/pmidcite/cfg.py b/src/pmidcite/cfg.py
index c880c75..d25ef73 100644
--- a/src/pmidcite/cfg.py
+++ b/src/pmidcite/cfg.py
@@ -1,7 +1,7 @@
"""Manage pmidcite Configuration"""
-__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
-__author__ = "DV Klopfenstein"
+__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
from os import environ
from os import getcwd
@@ -57,7 +57,7 @@ class Cfg(object):
}
def __init__(self, check=True, prt=stdout, prt_fullname=True):
- self.cfgfile = self._init_cfgfilename()
+ self.cfgfile = self._init_cfgfilename(prt)
self.cfgparser = self._get_dflt_cfgparser()
if check:
self._run_chk(prt, prt_fullname)
@@ -135,14 +135,14 @@ def _get_dirname_str(dirname):
"""Convert None to the str, "None", as needed by configparser"""
return 'None' if dirname is None or dirname == 'None' else dirname
- def get_nihgrouper(self):
+ def get_nihgrouper(self, min1=None, min2=None, min3=None, min4=None):
"""Get an NIH Grouper with default values from the cfg file"""
cfg = self.cfgparser['pmidcite']
return NihGrouper(
- float(cfg['group1_min']),
- float(cfg['group2_min']),
- float(cfg['group3_min']),
- float(cfg['group4_min']))
+ float(cfg['group1_min'] if not min1 else min1),
+ float(cfg['group2_min'] if not min2 else min2),
+ float(cfg['group3_min'] if not min3 else min3),
+ float(cfg['group4_min'] if not min4 else min4))
def _run_chk(self, prt, prt_fullname):
if not self.rd_rc(prt, prt_fullname):
@@ -215,19 +215,18 @@ def prt_rcfile_dflt(self, prt=stdout):
cfgparser = self._get_dflt_cfgparser()
cfgparser.write(prt)
- def _init_cfgfilename(self):
+ def _init_cfgfilename(self, prt=None):
"""Get the configuration filename"""
if self.envvar in environ:
cfgfile = environ[self.envvar]
if exists(cfgfile):
return cfgfile
- print('**WARNING: NO pmidcite CONFIG FILE FOUND AT {ENVVAR}={F}'.format(
- F=cfgfile, ENVVAR=self.envvar))
- if not exists(self.dfltcfgfile):
- print('**WARNING: NO pmidcite CONFIG FILE FOUND: {F}'.format(
- F=self.dfltcfgfile))
+ if prt:
+ prt.write(f'**WARNING: NO pmidcite CONFIG FILE FOUND AT {self.envvar}={cfgfile}\n')
+ if not exists(self.dfltcfgfile) and prt:
+ prt.write(f'**WARNING: NO pmidcite CONFIG FILE FOUND: {self.dfltcfgfile}\n')
return self.dfltcfgfile
-# Copyright (C) 2019-present DV Klopfenstein. All rights reserved.
+# Copyright (C) 2019-present DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/cli/icite.py b/src/pmidcite/cli/icite.py
index 7f4bfac..db35437 100644
--- a/src/pmidcite/cli/icite.py
+++ b/src/pmidcite/cli/icite.py
@@ -1,7 +1,7 @@
"""Manage args for NIH iCite run for one PubMed ID (PMID)"""
-__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
-__author__ = "DV Klopfenstein"
+__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
from sys import stdout
import argparse
@@ -11,7 +11,7 @@
from pmidcite.cli.utils import get_outfile
from pmidcite.cli.utils import get_pmids
from pmidcite.cli.entry_keyset import get_details_cites_refs
-from pmidcite.icite.nih_grouper import NihGrouper
+from pmidcite.icite.nih_grouper import get_nihgrouper
from pmidcite.icite.downloader import get_downloader
from pmidcite.icite.downloader import prt_hdr
from pmidcite.icite.downloader import prt_keys
@@ -61,10 +61,10 @@ def get_argparser(self):
help='Load and print a descriptive list of citations and references for each paper.')
parser.add_argument(
'-c', '--load_citations', action='store_true', default=False,
- help='Load and print a descriptive list of citations for each paper.')
+ help='Load and print of papers and clinical studies that cited the requested paper.')
parser.add_argument(
'-r', '--load_references', action='store_true', default=False,
- help='Load and print a descriptive list of references for each paper.')
+ help='Load and print the references for each requested paper.')
# pylint: disable=line-too-long
parser.add_argument(
'-R', '--no_references', action='store_true',
@@ -120,7 +120,7 @@ def cli(self):
"""Run iCite/PubMed using command-line interface"""
argparser = self.get_argparser()
args = self._get_args(argparser)
- ## print('ICITE ARGS ../pmidcite/src/pmidcite/cli/icite.py', args)
+ ##print('ICITE ARGS ../pmidcite/src/pmidcite/cli/icite.py', args)
self._run(args, argparser)
def _run(self, args, argparser):
@@ -173,7 +173,7 @@ def _get_downloader(args):
args.load_citations,
args.load_references,
args.no_references)
- groupobj = NihGrouper(args.min1, args.min2, args.min3, args.min4)
+ groupobj = get_nihgrouper(args.min1, args.min2, args.min3, args.min4)
return get_downloader(
details_cites_refs,
groupobj,
@@ -261,4 +261,4 @@ def _prt_no_icite(pmids):
Ps=' '.join(str(p) for p in pmids)))
-# Copyright (C) 2019-present DV Klopfenstein. All rights reserved.
+# Copyright (C) 2019-present DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/cli/summarize_papers.py b/src/pmidcite/cli/summarize_papers.py
index 08e96bb..7195643 100644
--- a/src/pmidcite/cli/summarize_papers.py
+++ b/src/pmidcite/cli/summarize_papers.py
@@ -5,6 +5,7 @@
from pmidcite.cli.utils import prt_loc_rcfile
from pmidcite.cli.utils import get_files_exists
from pmidcite.summarize_papers import SummarizePapers
+from pmidcite.icite.top_cit_ref import TopCitRef
__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved."
__author__ = "DV Klopfenstein, PhD"
@@ -15,11 +16,12 @@ class SummarizePapersCli:
def __init__(self, cfg):
self.cfg = cfg
+ self.topcitref = TopCitRef()
def get_argparser(self):
"""Argument parser for summarizing the citations on set(s) of papers"""
parser = ArgumentParser(
- description="Summarize NIH's citation on a set(s) of papers",
+ description="Summarize NIH's citation data on a set(s) of papers",
add_help=False)
##cfg = self.cfg
# https://docs.python.org/3/library/argparse.html
@@ -48,28 +50,34 @@ def get_argparser(self):
parser.add_argument(
'--print-rcfile', action='store_true',
help='Print the location of the pmidcite configuration file (env var: PMIDCITECONF)')
+ self.topcitref.add_arguments(parser)
return parser
-
def cli(self):
"""Run citation summary on a set(s) of PMIDs"""
argparser = self.get_argparser()
args = argparser.parse_args()
- print('ARGS CITE SUMMARY ../pmidcite/src/pmidcite/cli/summarize_papers.py', args)
+ ##print('ARGS CITE SUMMARY ../pmidcite/src/pmidcite/cli/summarize_papers.py', args)
if args.print_rcfile:
prt_loc_rcfile(self.cfg, stdout)
- return
- files = get_files_exists(args.files)
+ files = get_files_exists(args.files, stdout)
if args.help or not files:
argparser.print_help()
- print('\nHelp message printed because: -h or --help == True')
- return
- ##self._run(args, argparser)
- nih_grouper = self.cfg.get_nihgrouper()
+ ##print(f'\nHelp message printed because: -h or --help == {args.help} or {args.files}')
+ nih_grouper = self.cfg.get_nihgrouper(args.min1, args.min2, args.min3, args.min4)
+ self._summarize_papers(files, nih_grouper, self.topcitref.adjust_args(args.paper_labels))
+ if args.prt_nihgrpr:
+ print(nih_grouper)
+
+ @staticmethod
+ def _summarize_papers(files, nih_grouper, top_cit_refs):
+ """Summarize papers"""
for filename in files:
- sumpap = SummarizePapers.from_file(filename, nih_grouper)
+ sumpap = SummarizePapers.from_file(
+ filename=filename,
+ nih_grouper=nih_grouper,
+ top_cit_ref=top_cit_refs)
print(sumpap.str_oneline())
- return
# Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/eutils/pubmed/rdwr.py b/src/pmidcite/eutils/pubmed/rdwr.py
index ec8a6b3..076987c 100755
--- a/src/pmidcite/eutils/pubmed/rdwr.py
+++ b/src/pmidcite/eutils/pubmed/rdwr.py
@@ -1,7 +1,7 @@
"""Write Python module for downloaded abstracts."""
-__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
-__author__ = "DV Klopfenstein"
+__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
import sys
import os
@@ -151,8 +151,9 @@ def _lid_add_to_dict(fld2objs, fld, line, pmid):
if fld not in fld2objs:
fld2objs[fld] = {}
key0 = line.rfind('[')
- # TBD Change these fatals to messages
- assert key0 != -1, '**FATAL LID: {} {}'.format(fld, line)
+ if key0 == -1:
+ ##print(f'**WARNING Local ID (LID): {fld} KEY({key0}) {line}')
+ return
assert line[-1] == ']', '**FATAL LID: {} {}'.format(fld, line)
key = line[key0 + 1:-1]
val = line[:key0].strip()
@@ -347,7 +348,7 @@ def _init_date(self, fld2objs, fld, str_date, pmid):
#### mtch = match(r'(\d{4} \S{3} \d{1,2})\s*-', str_date)
#### if mtch:
- #### fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b %d")
+ ## fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b %d")
#### mtch = match(r'(\d{4} \S{3})\w?\s*-', str_date)
#### if mtch:
#### fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b")
@@ -453,4 +454,4 @@ def _extract_fldvals(self, line):
self.fldvals[-1][1].append(line_body)
- # Copyright (C) 2019-present, DV Klopfenstein. All rights reserved.
+ # Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/icite/nih_grouper.py b/src/pmidcite/icite/nih_grouper.py
index 7e89d1d..a4e497e 100644
--- a/src/pmidcite/icite/nih_grouper.py
+++ b/src/pmidcite/icite/nih_grouper.py
@@ -1,10 +1,22 @@
"""Groups papers using the NIH percentile"""
-__copyright__ = "Copyright (C) 2021-present, DV Klopfenstein. All rights reserved."
-__author__ = "DV Klopfenstein"
+__copyright__ = "Copyright (C) 2021-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
from collections import namedtuple
+def get_nihgrouper(min1, min2, min3, min4):
+ """Get NihGrouper, given NIH percentile dividers"""
+ args = {}
+ if min1:
+ args['group1_min'] = min1
+ if min2:
+ args['group2_min'] = min2
+ if min3:
+ args['group3_min'] = min3
+ if min4:
+ args['group4_min'] = min4
+ return NihGrouper(**args)
class NihGrouper:
"""Groups papers using the NIH percentile"""
@@ -18,6 +30,8 @@ def __init__(self, group1_min=2.1, group2_min=15.7, group3_min=83.9, group4_min=
self.min2 = group2_min
self.min3 = group3_min
self.min4 = group4_min
+ assert group1_min and group2_min and group3_min and group4_min, \
+ f'DIVIDERS MUST BE FLOATs: {str(self)}'
#print(f'group1_min: {group1_min}')
#print(f'group2_min: {group2_min}')
#print(f'group3_min: {group3_min}')
@@ -31,6 +45,7 @@ def str_group(self, nih_percentile):
def get_group(self, nih_percentile):
"""Assign group numbers to the NIH percentile values using the 68-95-99.7 rule"""
# No NIH percentile yet assigned. This paper should be checked out.
+ ##print('DVK SSSSSSSSSS', str(self))
if nih_percentile is None or nih_percentile == -1:
return 5
# 2.1% -3 SD: Very low citation rate
@@ -52,17 +67,23 @@ def add_arguments(self, parser):
"""Add NIH grouper arguments to the parser"""
# pylint: disable=line-too-long
parser.add_argument(
- '-1', metavar='group1_min', dest='min1', default=self.min1, type=float,
+ ##'-1', metavar='group1_min', dest='min1', default=self.min1, type=float,
+ '-1', metavar='group1_min', dest='min1', type=float,
help='Minimum NIH percentile to be placed in group 1 (default: {D})'.format(D=self.min1))
parser.add_argument(
- '-2', metavar='group2_min', dest='min2', default=self.min2, type=float,
+ '-2', metavar='group2_min', dest='min2', type=float,
help='Minimum NIH percentile to be placed in group 2 (default: {D})'.format(D=self.min2))
parser.add_argument(
- '-3', metavar='group3_min', dest='min3', default=self.min3, type=float,
+ '-3', metavar='group3_min', dest='min3', type=float,
help='Minimum NIH percentile to be placed in group 3 (default: {D})'.format(D=self.min3))
parser.add_argument(
- '-4', metavar='group4_min', dest='min4', default=self.min4, type=float,
+ '-4', metavar='group4_min', dest='min4', type=float,
help='Minimum NIH percentile to be placed in group 4 (default: {D})'.format(D=self.min4))
+ # --print-NIH-dividers => prt_nihgrpr=True
+ # => prt_nihgrpr=False
+ parser.add_argument(
+ '--print-NIH-dividers', dest='prt_nihgrpr', action='store_true',
+ help='Print the NIH percentile grouper divider percentages')
def get_list(self):
"""Get the dividing values as a list"""
@@ -74,4 +95,4 @@ def __str__(self):
self.min1, self.min2, self.min3, self.min4)
-# Copyright (C) 2021-present DV Klopfenstein. All rights reserved.
+# Copyright (C) 2021-present DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/icite/top_cit_ref.py b/src/pmidcite/icite/top_cit_ref.py
new file mode 100644
index 0000000..a2d8749
--- /dev/null
+++ b/src/pmidcite/icite/top_cit_ref.py
@@ -0,0 +1,47 @@
+"""Manage paper labels: TOP CIT CLI REF"""
+
+__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
+
+
+class TopCitRef:
+ """Manage paper labels: TOP CIT CLI REF"""
+
+ label_list = [
+ 'TOP', # Paper of interest
+ 'CIT', # A paper (not a clinical study) citing the paper of interest
+ 'CLI', # A clinical study paper citing the paper of interest
+ 'REF', # A paper in the reference list of the paper of interest
+ ]
+
+ label_set = set(label_list)
+
+ choices = label_list + ['CITS', 'ALL']
+
+ def add_arguments(self, parser):
+ """Manage paper labels arguments: TOP CIT CLI REF"""
+ # pylint: disable=line-too-long
+ parser.add_argument(
+ '-p', metavar='labels', dest='paper_labels', type=str, nargs='*',
+ default=['TOP',],
+ choices=self.choices,
+ help=f'Paper label choices: {" ".join(self.choices)} (default: TOP)',
+ )
+
+ def adjust_args(self, args_paper_labels):
+ """Given labels and aliases (CITS, ALL), return official label names"""
+ if not args_paper_labels:
+ return None
+ ret = set()
+ arg_set = set(args_paper_labels)
+ if 'ALL' in arg_set:
+ ret.update(self.label_list)
+ return ret
+ if 'CITS' in arg_set:
+ ret.add('CIT')
+ ret.add('CLI')
+ ret.update(arg_set.intersection(self.label_list))
+ return ret
+
+
+# Copyright (C) 2022-present DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/scripts/icite.py b/src/pmidcite/scripts/icite.py
index 5ee02cc..a4239ea 100755
--- a/src/pmidcite/scripts/icite.py
+++ b/src/pmidcite/scripts/icite.py
@@ -1,7 +1,7 @@
"""Given a PubMed ID (PMID), return a list of citing publications"""
-__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
-__author__ = "DV Klopfenstein"
+__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
from pmidcite.cli.icite import NIHiCiteCli # get_argparser
from pmidcite.cfg import get_cfgparser
@@ -12,4 +12,4 @@ def main():
NIHiCiteCli(get_cfgparser(prt=None)).cli()
-# Copyright (C) 2019-present, DV Klopfenstein. All rights reserved.
+# Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/pmidcite/summarize_papers.py b/src/pmidcite/summarize_papers.py
new file mode 100644
index 0000000..54d3c33
--- /dev/null
+++ b/src/pmidcite/summarize_papers.py
@@ -0,0 +1,103 @@
+"""Summarize NIH citation data for requested papers from the commandline or in files"""
+
+from collections import namedtuple
+from collections import defaultdict
+
+__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
+
+
+class SummarizePapers:
+ """Summarize NIH citation data for requested papers from the commandline or in files"""
+
+ def __init__(self, name, nih_grouper=None):
+ self.name = name
+ self.nts = None
+ self.num_papers_all = None
+ self.nihgrpr = nih_grouper
+
+ def str_oneline(self):
+ """Get str that is a one-line summary of many papers/citiations"""
+ grp2nts = self._get_stats_grpr() if self.nihgrpr else self._get_stats_nogrpr()
+ years = self.get_years()
+ year_min = min(years)
+ year_max = max(years)
+ return '{NIHP} {Ys:3} years:{Y0:4}-{Y1:4} {N:5} papers {NAME}'.format(
+ NIHP=self._str_group_percs(grp2nts),
+ Ys=year_max-year_min,
+ Y0=year_min,
+ Y1=year_max,
+ N=self.num_papers_all,
+ NAME=self.name)
+
+ def get_years(self):
+ """Get the years of all publications"""
+ return list(nt.year for nt in self.nts)
+
+ def _str_group_percs(self, grp2nts):
+ """Get precentages of papers in each group"""
+ lst = []
+ for grp in ['i', '4', '3', '2', '1', '0']:
+ num_papers_grp = len(grp2nts[grp]) if grp2nts else 0
+ abc = '{G}={P}'.format(
+ G=grp,
+ P='{:05.1f}%'.format(
+ num_papers_grp/self.num_papers_all*100) if num_papers_grp != 0 else "......")
+ lst.append(abc)
+ return ' '.join(lst)
+
+ def _get_stats_grpr(self):
+ """Get summary information for list of papers"""
+ grp2nts = defaultdict(list)
+ grpr = self.nihgrpr
+ for ntd in self.nts:
+ grp2nts[grpr.str_group(ntd.nih_perc)].append(ntd)
+ ##print('DDDDDDDD', ntd)
+ return grp2nts
+
+ def _get_stats_nogrpr(self):
+ """Get summary information for list of papers"""
+ grp2nts = defaultdict(list)
+ for ntd in self.nts:
+ grp2nts[ntd.nih_group].append(ntd)
+ return grp2nts
+
+ @staticmethod
+ def read_lines(filename, top_cit_ref):
+ """Read paper citation lines"""
+ if top_cit_ref is None:
+ top_cit_ref = {'TOP',} # TOP, CIT, CLI, REF
+ nts = []
+ nto = namedtuple('iciteline', (
+ 'line pmid aart nih_perc nih_group year num_cite_all num_cite num_clin num_refs'))
+ with open(filename) as ifstrm:
+ for line in ifstrm:
+ if line[:3] in top_cit_ref:
+ flds = line.split(maxsplit=10)
+ if flds[1].isdigit():
+ num_cite = int(flds[7])
+ num_clin = int(flds[8])
+ nts.append(nto(
+ line=line.rstrip(),
+ pmid=int(flds[1]),
+ aart=f'{flds[2]} {flds[3]}',
+ nih_perc=int(flds[4]),
+ nih_group=flds[5], # -i or a number
+ year=int(flds[6]),
+ num_cite_all=num_cite + num_clin,
+ num_cite=num_cite,
+ num_clin=num_clin,
+ num_refs=int(flds[9])))
+ return nts
+
+ # -- Constructors ------------------------------------------------------------
+ @classmethod
+ def from_file(cls, filename, nih_grouper=None, top_cit_ref=None):
+ """Get SummarizePapers instance, given a file filled with icite lines w/TOP|CIT|CLI|REF"""
+ obj = cls(filename, nih_grouper)
+ obj.nts = obj.read_lines(filename, top_cit_ref)
+ obj.num_papers_all = len(obj.nts)
+ return obj
+
+
+# Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/tests/test_speed_api_dnld.py b/src/tests/test_speed_api_dnld.py
index 8860dca..9afa1a3 100755
--- a/src/tests/test_speed_api_dnld.py
+++ b/src/tests/test_speed_api_dnld.py
@@ -13,7 +13,7 @@
from tests.pmids_i3 import PMIDS
-def test_dnld_speed():
+def test_speed_api_dnld():
"""Test speed for download NIH citation data"""
force_dnld = True
dnldr = _init_dnldr(force_dnld)
@@ -60,6 +60,6 @@ def _init_dnldr(force_dnld):
if __name__ == '__main__':
- test_dnld_speed()
+ test_speed_api_dnld()
-# Copyright (C) 2021-present, DV Klopfenstein. All rights reserved.
+# Copyright (C) 2021-present, DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/tests/test_speed_dnld_load.py b/src/tests/test_speed_dnld_load.py
index cc2b6d2..9058a4e 100755
--- a/src/tests/test_speed_dnld_load.py
+++ b/src/tests/test_speed_dnld_load.py
@@ -16,7 +16,7 @@
from tests.pmids_i3 import PMIDS
-def test_dnld_speed():
+def test_speed_dnld_load():
"""Test speed for download NIH citation data"""
fout_log = 'test_speed_dnld_load.log'
num = 5000
@@ -82,6 +82,6 @@ def _run_download(dnldr, pmids):
if __name__ == '__main__':
- test_dnld_speed()
+ test_speed_dnld_load()
-# Copyright (C) 2021-present, DV Klopfenstein. All rights reserved.
+# Copyright (C) 2021-present, DV Klopfenstein, PhD. All rights reserved.
diff --git a/src/tests/test_topcitref_args.py b/src/tests/test_topcitref_args.py
new file mode 100755
index 0000000..da26087
--- /dev/null
+++ b/src/tests/test_topcitref_args.py
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+"""Test paper label args: TOP CIT CLI REF and aliases ALL CITS"""
+
+from pmidcite.icite.top_cit_ref import TopCitRef
+
+__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved."
+__author__ = "DV Klopfenstein, PhD"
+
+
+ADJ = TopCitRef().adjust_args
+
+def test_topcitref_args():
+ """Test paper label args: TOP CIT CLI REF and aliases ALL CITS"""
+ # pylint: disable=bad-whitespace
+
+ # Arguments Expected paper labels
+ # ---------------------- -------------------------------
+ _chk(0, set(), None)
+ _chk(1, {'ALL',}, {'TOP', 'CIT', 'CLI', 'REF'})
+ _chk(2, {'CITS',}, {'CIT', 'CLI'})
+ _chk(3, {'TOP', 'CITS',}, {'TOP', 'CIT', 'CLI'})
+ _chk(4, {'TOP', 'CITS', 'REF'}, {'TOP', 'CIT', 'CLI', 'REF'})
+ _chk(5, {'TOP', 'MOCK', 'REF'}, {'TOP', 'REF'})
+
+
+def _chk(num, args, exp):
+ """Check that args produces correct paper label cites"""
+ act = ADJ(args)
+ assert act == exp, f'TEST {num} ACT({act}) != EXP({exp}) WITH ARGS({args})'
+ print(f'**PASSED TEST {num:2}: ARGS({args}) ADJUSTED TO {exp}')
+
+
+if __name__ == '__main__':
+ test_topcitref_args()
+
+# Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved.