Getting started with reproduce.work

Invisible Link Invisible Link Invisible Link Invisible Link

reproduce.work is being published as an alpha development release and should be considered experimental.

Pre-requisites

Knowledge:

  • Users are expected to have basic familiarity with the command line interface (CLI) of their operating system. The following instructions are for users of Linux and macOS. Windows users should install Windows Subsystem for Linux and follow the instructions for Linux users.

Software:

  • The reproduce.work ecosystem relies on containerization to facilitate cross-platform computing; as such, it is required that you install Docker (or a suitable drop-in replacement such as OrbStack; recommended for Apple Silicon machines). You do not need deep familiarity with Docker or containerization to use reproduce.work, but you will need to install Docker and ensure that it is running on your machine (which you can confirm by running docker in your preferred terminal).

Installation

The reproduce.work command line interface can be installed with the following shell command:

Terminal
curl -sSL https://reproduce.work/install | bash

You will be prompted with two options:

  1. Install to your machine (in /usr/local/bin) for use anywhere in your command line
  2. Install to your current directory. This creates a folder in your current directory named rw-project; with this choice, the rw command line tools can only be executed at the root of your project directory (and you may need to replace any rw command with ./rw).

Besides containerization software and the reproduce.work CLI tool, no other software is required. All other dependencies will be installed inside a containerized environment automatically when you run the rw build command. Your reproduce.work projects should not interfere with each other or any other software you have installed on your machine.

Install manually The binaries for reproduce.work are also available for direct download from GitHub: https://github.com/reproduce-work/reproduce-work-cli/releases

Basic Commands

There are THREE main commands in the reproduce.work workflow:

  1. rw init: initialize a new project
  2. rw build: download dependencies and install/package them in a self-contained environment
  3. rw launch: launch your project’s scientific environment and begin working

Quick Start

Create a directory for your project:

Terminal
mkdir my_project && cd my_project
Terminal
rw init && rw build && rw launch -o

Suggested usage:

Under the hood, rw quickstart is executing the following commands in order.

To get a sense of the reproduce.work workflow, we recommend manually running these three commands yourself when starting a new project, though quickstart was developed for a reason!

Primary commands:

1. Initialize: rw init

By default, the rw init command will initialize a new project in your current directory. It is recommended that you create a new directory for each project, and run rw init from within that directory at the start of each project.

Options:

  • -s, --sci-env <env>: Set the scientific environment.
    • The currently supported options are:
      • jupyter (default)
      • python
      • rstudio
  • -f, --force: Force new configuration by overwriting existing config.toml file.

Depending on which options you choose, the way you build and launch your scientific environment will vary:

Terminal
# --sci-env=jupyter by default
rw init 

After running this command, you should see several files and folders added to your project directory:

Project directory structure

  • reproduce/
    • requirements.txt
    • config.toml
    • Dockerfile
  • .gitignore

2. Build: rw build

After initializing a project, it must be “built”. This is the process of downloading the software required for running your project and packaging it in a container.

Options:

  • --no-cache: Download dependencies from the web without using locally cached versions. Default is false.
  • -v, --verbose: Prints to console the output of your project’s build process. Default is false.

3. Develop: rw launch

This command starts your scientific computing environment and allows you to begin writing code and analyzing data.

Options:

  • -o, --open: Opens the scientific environment in your default browser. Default is false.
  • -p, --port <port>: Set the local port for the jupyter server manually; otherwise, an open port will be found automatically.

Installing packages and dependencies

While in the scientific development environment, you can install packages in one of two ways:

  • Persistent: Add your desired packages on separate lines to reproduce/requirements.txt and run rw build again. After “building” your scientific environment, you can stop and restart it and your packages will be installed.

  • Temporary: While your dev environment is running, you can use pip install <module>; however keep in mind that modules/packages installed this way will not persist across sessions by default (i.e. if you stop and restart your scientific environment, you will need to reinstall them). This method is suitable for development/testing, but packages that are core to your project should be added to reproduce/requirements.txt.

Terminal
rw init --sci-env=python

After running this command, you should see several files and folders added to your project directory:

Project directory structure

  • reproduce/
    • requirements.txt
    • config.toml
    • Dockerfile
  • .gitignore

2. Build: rw build

After initializing a project, it must be “built”. This is the process of downloading the software required for running your project and packaging it in a container.

Options:

  • --no-cache: Download dependencies from the web without using locally cached versions. Default is false.
  • -v, --verbose: Prints to console the output of your project’s build process. Default is false.

3. Develop: rw launch

This command starts your scientific computing environment and allows you to begin writing code and analyzing data.

Options:

  • -o, --open: Opens the scientific environment in your default browser. Default is false.
  • -p, --port <port>: Set the local port for the jupyter server manually; otherwise, an open port will be found automatically.

Installing packages and dependencies

While in the scientific development environment, you can install packages in one of two ways:

  • Persistent: Add your desired packages on separate lines to reproduce/requirements.txt and run rw build again. After “building” your scientific environment, you can stop and restart it and your packages will be installed.

  • Temporary: While your dev environment is running, you can use pip install <module>; however keep in mind that modules/packages installed this way will not persist across sessions by default (i.e. if you stop and restart your scientific environment, you will need to reinstall them). This method is suitable for development/testing, but packages that are core to your project should be added to reproduce/requirements.txt.

Terminal
rw init --sci-env=rstudio

After running this command, you should see several files and folders added to your project directory:

Project directory structure

  • reproduce/
    • packages.R
    • config.toml
    • Dockerfile
  • .gitignore

2. Build: rw build

After initializing a project, it must be “built”. This is the process of downloading the software required for running your project and packaging it in a container.

Options:

  • --no-cache: Download dependencies from the web without using locally cached versions. Default is false.
  • -v, --verbose: Prints to console the output of your project’s build process. Default is false.

3. Develop: rw launch

This command starts your scientific computing environment and allows you to begin writing code and analyzing data.

Options:

  • -o, --open: Opens the scientific environment in your default browser. Default is false.
  • -p, --port <port>: Set the local port for the jupyter server manually; otherwise, an open port will be found automatically.

Installing packages and dependencies

While in the scientific development environment, you can install packages in one of two ways:

  • Persistent: Add your desired packages on separate lines to reproduce/packages.R and run rw build again. After “building” your scientific environment, you can stop and restart it and your packages will be installed.

  • Temporary: While your dev environment is running, you can use install.packages(<pkg>); however keep in mind that modules/packages installed this way will not persist across sessions by default (i.e. if you stop and restart your scientific environment, you will need to reinstall them). This method is suitable for development/testing, but packages that are core to your project should be added to reproduce/packages.R.