Getting started with reproduce.work
reproduce.work is being published as an alpha development release and should be considered experimental.
Pre-requisites
Knowledge:
- Users are expected to have basic familiarity with the command line interface (CLI) of their operating system. The following instructions are for users of Linux and macOS. Windows users should install Windows Subsystem for Linux and follow the instructions for Linux users.
Software:
- The reproduce.work ecosystem relies on containerization to facilitate cross-platform computing; as such, it is required that you install Docker (or a suitable drop-in replacement such as OrbStack; recommended for Apple Silicon machines). You do not need deep familiarity with Docker or containerization to use reproduce.work, but you will need to install Docker and ensure that it is running on your machine (which you can confirm by running
docker
in your preferred terminal).
Installation
The reproduce.work command line interface can be installed with the following shell command:
Terminal
curl -sSL https://reproduce.work/install | bash
You will be prompted with two options:
- Install to your machine (in
/usr/local/bin
) for use anywhere in your command line - Install to your current directory. This creates a folder in your current directory named
rw-project
; with this choice, therw
command line tools can only be executed at the root of your project directory (and you may need to replace anyrw
command with./rw
).
Besides containerization software and the reproduce.work CLI tool, no other software is required. All other dependencies will be installed inside a containerized environment automatically when you run the rw build
command. Your reproduce.work projects should not interfere with each other or any other software you have installed on your machine.
Install manually
The binaries for reproduce.work are also available for direct download from GitHub: https://github.com/reproduce-work/reproduce-work-cli/releasesBasic Commands
There are THREE main commands in the reproduce.work workflow:
rw init
: initialize a new projectrw build
: download dependencies and install/package them in a self-contained environmentrw launch
: launch your project’s scientific environment and begin working
Quick Start
Create a directory for your project:
Terminal
mkdir my_project && cd my_project
Terminal
rw init && rw build && rw launch -o
Suggested usage:
Under the hood, rw quickstart
is executing the following commands in order.
To get a sense of the reproduce.work workflow, we recommend manually running these three commands yourself when starting a new project, though quickstart
was developed for a reason!
Primary commands:
1. Initialize: rw init
By default, the rw init
command will initialize a new project in your current directory. It is recommended that you create a new directory for each project, and run rw init
from within that directory at the start of each project.
Options:
-s, --sci-env <env>
: Set the scientific environment.- The currently supported options are:
jupyter
(default)python
rstudio
- The currently supported options are:
-f, --force
: Force new configuration by overwriting existingconfig.toml
file.
Depending on which options you choose, the way you build and launch your scientific environment will vary:
Terminal
# --sci-env=jupyter by default
rw init
After running this command, you should see several files and folders added to your project directory:
Project directory structure
-
reproduce/
- requirements.txt
- config.toml
- Dockerfile
- .gitignore
2. Build: rw build
After initializing a project, it must be “built”. This is the process of downloading the software required for running your project and packaging it in a container.
Options:
--no-cache
: Download dependencies from the web without using locally cached versions. Default isfalse
.-v, --verbose
: Prints to console the output of your project’s build process. Default isfalse
.
3. Develop: rw launch
This command starts your scientific computing environment and allows you to begin writing code and analyzing data.
Options:
-o, --open
: Opens the scientific environment in your default browser. Default isfalse
.-p, --port <port>
: Set the local port for the jupyter server manually; otherwise, an open port will be found automatically.
Installing packages and dependencies
While in the scientific development environment, you can install packages in one of two ways:
Persistent: Add your desired packages on separate lines to
reproduce/requirements.txt
and runrw build
again. After “building” your scientific environment, you can stop and restart it and your packages will be installed.Temporary: While your dev environment is running, you can use
pip install <module>
; however keep in mind that modules/packages installed this way will not persist across sessions by default (i.e. if you stop and restart your scientific environment, you will need to reinstall them). This method is suitable for development/testing, but packages that are core to your project should be added toreproduce/requirements.txt
.
Terminal
rw init --sci-env=python
After running this command, you should see several files and folders added to your project directory:
Project directory structure
-
reproduce/
- requirements.txt
- config.toml
- Dockerfile
- .gitignore
2. Build: rw build
After initializing a project, it must be “built”. This is the process of downloading the software required for running your project and packaging it in a container.
Options:
--no-cache
: Download dependencies from the web without using locally cached versions. Default isfalse
.-v, --verbose
: Prints to console the output of your project’s build process. Default isfalse
.
3. Develop: rw launch
This command starts your scientific computing environment and allows you to begin writing code and analyzing data.
Options:
-o, --open
: Opens the scientific environment in your default browser. Default isfalse
.-p, --port <port>
: Set the local port for the jupyter server manually; otherwise, an open port will be found automatically.
Installing packages and dependencies
While in the scientific development environment, you can install packages in one of two ways:
Persistent: Add your desired packages on separate lines to
reproduce/requirements.txt
and runrw build
again. After “building” your scientific environment, you can stop and restart it and your packages will be installed.Temporary: While your dev environment is running, you can use
pip install <module>
; however keep in mind that modules/packages installed this way will not persist across sessions by default (i.e. if you stop and restart your scientific environment, you will need to reinstall them). This method is suitable for development/testing, but packages that are core to your project should be added toreproduce/requirements.txt
.
Terminal
rw init --sci-env=rstudio
After running this command, you should see several files and folders added to your project directory:
Project directory structure
-
reproduce/
- packages.R
- config.toml
- Dockerfile
- .gitignore
2. Build: rw build
After initializing a project, it must be “built”. This is the process of downloading the software required for running your project and packaging it in a container.
Options:
--no-cache
: Download dependencies from the web without using locally cached versions. Default isfalse
.-v, --verbose
: Prints to console the output of your project’s build process. Default isfalse
.
3. Develop: rw launch
This command starts your scientific computing environment and allows you to begin writing code and analyzing data.
Options:
-o, --open
: Opens the scientific environment in your default browser. Default isfalse
.-p, --port <port>
: Set the local port for the jupyter server manually; otherwise, an open port will be found automatically.
Installing packages and dependencies
While in the scientific development environment, you can install packages in one of two ways:
Persistent: Add your desired packages on separate lines to
reproduce/packages.R
and runrw build
again. After “building” your scientific environment, you can stop and restart it and your packages will be installed.Temporary: While your dev environment is running, you can use
install.packages(<pkg>)
; however keep in mind that modules/packages installed this way will not persist across sessions by default (i.e. if you stop and restart your scientific environment, you will need to reinstall them). This method is suitable for development/testing, but packages that are core to your project should be added toreproduce/packages.R
.