This version is outdated by a newer approved version.DiffThis version (2021/09/29 12:24) is a draft.
Approvals: 0/1

This is an old revision of the document!


http://www.r-project.org/

We installed the libraries Rmpi, doMPI and foreach and their dependencies on VSC2 and VSC4. These libraries give you the possibility to parallelize loops over more nodes with MPI using the foreach function which is very similar to a for loop.

Given a simple for loop:

for (i in 1:50) {mean(rnorm(1e+07))}

Sequential execution

Sequential execution on VSC3 leads to an execution time [s] of:

/opt/sw/R/current/bin/R
> system.time(for (i in 1:50) { mean(rnorm(1e+07))})
   user  system elapsed 
 83.690   1.013  84.721 

Parallel execution

In R, the code berk-rmpi.R may be parallelized in the following form:

# basic example with foreach
# start R as usual:'R'or via a batch job
library (Rmpi)             
library (doMPI)
cl <- startMPIcluster ()
registerDoMPI (cl)

result <- foreach (i = 1:50) %dopar% {
mean(rnorm(1e+07))
}

closeCluster(cl)
mpi.finalize()

On VSC2, a batch job is submitted by using the following script:

#$ -N rstat
#$ -V
#$ -pe mpich 16
#$ -l h_rt=01:00:00

mpirun -machinefile $TMPDIR/machines -np $NSLOTS /opt/sw/R/current/bin/R CMD BATCH berk-rmpi.R

yielding to an execution time [s] of

> proc.time()
   user  system elapsed 
  8.495   0.264   9.616 

On VSC3 the script reads:

#!/bin/sh
#SBATCH -J rstat
#SBATCH -N 1
#SBATCH --tasks-per-node=16

module unload intel-mpi/5
module load intel-mpi/4.1.3.048
module load R

export I_MPI_FABRICS=shm:tcp

mpirun R CMD BATCH berk-rmpi.R

yielding to an execution time [s] of

> proc.time()
   user  system elapsed 
  4.566   0.156   5.750 
  • doku/rstat.1632918283.txt.gz
  • Last modified: 2024/10/24 10:21
  • (external edit)