A Regulatory R Packages Infrastructure (RRPI):
Supporting a Wider Use of R in the Pharma Industry

“Every business is a software business.” [1]
Watts S. Humphrey,
the “father of software quality”

Background

Corresponding to major changes in many other industries, also the pharmaceutical industry is going through a transformation in its business model from using proprietary software towards open-source software.

The proprietary software model is based on external vendors delivering generic software solutions which are closed-source, self-contained and independent from other applications (“monolithic”) with limited communication between software developers and software users. In addition, the software release life cycle is based on conventional “waterfall” model with defined deliverables and release dates.

The open-source software model allows contributions from software developers from different background, also including contributions from data scientists and statistician within a specific company or industry. This offers the opportunity to develop specifically required software solutions within an industry without having to communicate software requirements to an external software vendor, who usually does not have the domain knowledge. In addition, open-source software and its values of transparency, collaboration, and knowledge sharing also provides opportunities for collaboration across the pharma industry, regulators, and academia, and other stakeholders. Finally, the software release life cycle is based on a more agile, continuous development and continuous integration (CI/CD) software release model.

Unfortunately, the transformation in the pharmaceutical industry held back so far because of uncertainties regarding quality of the available of open-source software, which are the result of the combination of several factors:

  • openness to contributions from a diverse community of software developers
  • a dynamic continuous, decentral software development model
  • a complex, hierarchical, distributed architecture of software dependencies
  • a strictly regulated business environment to protect patient health

A solution to overcome this barrier for the popular open-source programming language and software “R” and related packages could be an integrated Regulatory R Packages Infrastructure including the following components:

  • Validation Algorithm: a fully automated, multi-dimensional algorithm to evaluate software quality and to recommend R packages following a risk-based approach
  • Validated R Package Database: a database to store meta-data on R packages and their validation information
  • Validated R Package Repository: a repository to permanently store validated versions of R packages to allow long-term reproducibility of data analysis
  • Validated R/Pharma Reference Image: a stand-alone, integrated, fully validated virtual image of the R computing environment. This reference image aims to be the standard R computing environment for regulatory interactions and for the development of novel R packages in the pharmaceutical industry

Considering similar challenges in other industries  and areas of society, the proposed conceptual framework for the assessment of software quality of R packages may become a standard model for the evaluation of software quality of open-source software because of its generic character.

Parties of Interest

  • Pharmaceutical and biotech industry
  • Regulators
  • Academia
  • R community and related organizations
    • R Consortium [7]
      • R Consortium/R Repositories Working Group [8]
        • R Validation Hub [9]
          • Regulatory R Packages Reposistory Working Group [10]

RRPI Components

The following components are required for a regulatory R packages repository:

  • Validation algorithm: a fully automated, multi-dimensional algorithm to evaluate software quality and to recommend R packages following a risk-based approach
    • R package, eg R/riskmetric [3] and R/riskassessment (Shiny Frontend) [11],  R/riskscore (experimental!) [12]
    • permanent storage
    • version-controlled
    • reproducible
    • based on a standard, open infrastructure (ie, low risk of provider lock-in
    • allows self-hosting
    • fully automatic assessment of software quality of existing, updated, or novel R packages
    • recommendations for different use cases based on a risk-based approach
    • separate evaluation of the following main objectives of software quality (and associated risks):
      • validation of building from source code, ie user can successfully build R package
      • validation of installation of binary code, ie user can successfully install R package
      • validation of technical (statistical-algorithmic) correctness, ie all functions in the R package give correct results across a range of testing scenarios
      • validation of user acceptance  (ie user expectations based on software documentation), ie based conventions of statistical computing and the documentation, the packages behaves as expected by the user
        Example: If the documentation does not correctly or not clearly state, whether a 1-sided or 2-sided test is performed by the package and the user falsely interprets the test results, the software cannot be considered fully validated.
  • Validated R Package Database: a database to store meta-data on R packages and their validation information
    • relational database, eg MySQL [4]
    • permanent storage
    • version-controlled
    • reproducible
    • multi-dimensional software quality and developer quality features
  • Validated R Package Repository: a repository to permanently store validated versions of R packages to allow long-term reproducibility of data analysis
    • file system and file server, eg Ext4/OpenSSH [5]
    • permanent storage
    • version-controlled
    • reproducible
    • integrated with R package manager
  • Validated R/Pharma Reference Image: a stand-alone, integrated, fully validated virtual image of the R computing environment. This reference image aims to be the standard R computing environment for regulatory interactions and for the development of novel R packages in the pharmaceutical industry
    • virtual image, eg, docker/Rocker
    • permanent storage
    • version-controlled
    • reproducible
    • standalone, minimal, fully functional R computing environment with core packages
    • compatible with operating systems:
      • Microsoft Windows 10/11
      • GNU Linux
    • compatible with Integrated Development Environments (IDE), for example:
      • RStudio
      • Visual Studio Code
    • particular requirements of an organizations (eg, drug developers, regulators, academics)

Conclusion

This described conceptual framework requires further input from the  R users in Pharma community and R community and commitment for a technical implementation.

Acknowledgements

This blog was inspired by discussions by in the Regulatory R Packages Repository Working Group [10].

References

[1] https://quidgest.com/en/articles/every-business-software-business/

[2] https://pharmar.github.io/regulatory-r-repo-wg/

[3] https://cran.r-project.org/web/packages/riskmetric/index.html

[4] https://www.mysql.com/products/community/

[5] https://www.openssh.com

[6] https://rocker-project.org

[7] https://www.r-consortium.org

[8] https://github.com/RConsortium/r-repositories-wg

[9] https://www.pharmar.org

[10] https://pharmar.github.io/regulatory-r-repo-wg/

[11] https://github.com/pharmaR/riskassessment

[12] https://github.com/pharmaR/riskscore