Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
doku:matlab [2020/01/16 12:22] – [Example: Local Matlabpool] ir | doku:matlab [2024/01/31 11:17] – katrin | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Matlab | + | ====== Matlab |
- | ==== Run Matlab Task from command line ==== | + | ===== Quickstart: |
- | Generate a matlab package called | + | Generate a matlab package called |
< | < | ||
add(2,3.4) | add(2,3.4) | ||
</ | </ | ||
- | and a function | + | and a function |
< | < | ||
z=x+y</ | z=x+y</ | ||
- | There are two ways to run this from the command line | + | Run this from the command line: |
- | option 1:< | + | < |
- | /opt/ | + | module purge |
- | option 2:<code> | + | module load Matlab/(...Version...) |
- | /opt/sw/matlab-80/bin/matlab -nodisplay -r "add(2,3.4)"</ | + | matlab -nodisplay |
+ | run test.m | ||
+ | </ | ||
+ | |||
+ | ==== More info on MATLAB Programs and batch jobs ==== | ||
+ | For more information in MATLAB programs and batch jobs<html>< | ||
+ | |||
+ | ===== Getting Started with Serial and Parallel MATLAB | ||
+ | |||
+ | This page provides the steps to configure MATLAB to submit jobs to a cluster, retrieve results, and debug errors. | ||
+ | |||
+ | The guide can be followed either on a GUI node or via JupyterHub. | ||
+ | |||
+ | ==== Configuration ==== | ||
+ | |||
+ | After logging into the cluster, start MATLAB. | ||
+ | |||
+ | Jobs will now default to the cluster rather than submit to the local machine. | ||
+ | |||
+ | ==== Configuring Jobs ==== | ||
+ | |||
+ | Prior to submitting the job, various parameters can be assigned, such as queue, e-mail, walltime, etc. The following is a partial list of parameters. | ||
- | ===== Batch Matlab job on VSC-3 ===== | ||
- | In order to be able to use matlab, you have to load the program with the ' | ||
< | < | ||
- | [username@l32]$ module avail # select | + | >> % Get a handle to the cluster |
- | [username@l32]$ module load Matlab/v8.5_R2015a # load desired version | + | >> c = parcluster; |
- | [username@l32]$ module list # check loaded modules | + | |
- | Currently Loaded Modulefiles: | + | [REQUIRED] |
- | 1) Matlab/v9.5_R2018b | + | |
+ | >> % Specify memory per core (default: ' | ||
+ | >> c.AdditionalProperties.MemPerCPU = ' | ||
+ | |||
+ | >> % Specify number of nodes for two or more nodes | ||
+ | >> c.AdditionalProperties.NumNodes = 3; | ||
+ | |||
+ | [OPTIONAL] | ||
+ | |||
+ | >> % Specify the account | ||
+ | >> c.AdditionalProperties.AccountName = ' | ||
+ | |||
+ | >> % Specify a constraint | ||
+ | >> c.AdditionalProperties.Constraint = ' | ||
+ | |||
+ | >> % Request email notification of job status | ||
+ | >> c.AdditionalProperties.EmailAddress = ' | ||
+ | |||
+ | >> % Specify number of GPUs (default: 0) | ||
+ | >> c.AdditionalProperties.GPUsPerNode = 1; | ||
+ | |||
+ | >> % Specify the partition | ||
+ | >> c.AdditionalProperties.Partition = ' | ||
+ | |||
+ | >> % Specify cores per node | ||
+ | >> c.AdditionalProperties.ProcsPerNode = 4; | ||
+ | |||
+ | >> % Set node exclusivity (default: false) | ||
+ | >> c.AdditionalProperties.RequireExclusiveNode = true; | ||
+ | |||
+ | >> % Use reservation | ||
+ | >> c.AdditionalProperties.Reservation = ' | ||
+ | |||
+ | >> % Specify the wall time (e.g., 1 day, 5 hours, 30 minutes) | ||
+ | >> c.AdditionalProperties.WallTime = ' | ||
</ | </ | ||
- | (See also the introduction to the [[https:// | ||
- | Now, Matlab can be called by | + | Save changes after modifying AdditionalProperties for the above changes to persist between MATLAB sessions. |
< | < | ||
- | [username@l32]$ matlab | + | >> c.saveProfile |
</ | </ | ||
- | ==== Example: Serial Matlab Task ==== | ||
- | We use the matlab m-file {{doku: | + | To see the values of the current configuration options, display AdditionalProperties. |
< | < | ||
- | #!/bin/bash | + | >> % To view current properties |
- | # | + | >> c.AdditionalProperties |
- | #SBATCH -J test # job name | + | </ |
- | #SBATCH -N 1 # number of nodes | + | |
- | #SBATCH --ntasks-per-node=1 | + | |
- | #SBATCH --ntasks-per-core=1 | + | |
- | #SBATCH --threads-per-core=1 | + | |
- | #SBATCH --time=10 | + | |
- | #SBATCH -L matlab@vsc | + | |
- | module purge | + | Unset a value when no longer needed. |
- | module load Matlab/v9.5_R2018b # load desired version | + | < |
+ | >> % Turn off email notifications | ||
+ | >> c.AdditionalProperties.EmailAddress = ''; | ||
+ | >> c.saveProfile | ||
+ | </ | ||
- | export OMP_NUM_THREADS=1 | + | ==== Interactive Jobs ==== |
+ | To run an interactive pool job on the cluster, continue to use parpool as before. | ||
+ | < | ||
+ | >> % Get a handle to the cluster | ||
+ | >> c = parcluster; | ||
- | time matlab < myplot.m | + | >> % Open a pool of 64 workers on the cluster |
+ | >> pool = c.parpool(64); | ||
</ | </ | ||
+ | |||
+ | Rather than running local on the local machine, the pool can now run across multiple nodes on the cluster. | ||
< | < | ||
- | [username@l32]$ sbatch jobSerial.sh | + | >> % Run a parfor over 1000 iterations |
- | [username@l32]$ squeue -u username | + | >> parfor idx = 1:1000 |
+ | a(idx) = rand; | ||
+ | end | ||
</ | </ | ||
- | ==== Example: Local Matlabpool ==== | ||
- | We use the same m-file '' | ||
+ | Delete the pool when it’s no longer needed. | ||
+ | < | ||
+ | >> % Delete the pool | ||
+ | >> pool.delete | ||
+ | </ | ||
+ | |||
+ | ==== Independent Batch Job ==== | ||
+ | |||
+ | Use the batch command to submit asynchronous jobs to the cluster. | ||
< | < | ||
- | # | + | >> % Get a handle to the cluster |
- | # | + | >> c = parcluster; |
- | #SBATCH -J test # job name | + | |
- | #SBATCH -N 1 # number of nodes | + | |
- | #SBATCH --ntasks-per-node=16 | + | |
- | #SBATCH --ntasks-per-core=1 | + | |
- | #SBATCH --threads-per-core=1 | + | |
- | #SBATCH --time=100 | + | |
- | #SBATCH -L matlab@vsc | + | |
- | module purge | + | >> % Submit job to query where MATLAB is running on the cluster |
- | module load Matlab/v9.5_R2018b # load desired version | + | >> job = c.batch(@pwd, 1, {}); |
+ | |||
+ | >> % Query job for state | ||
+ | >> job.State | ||
- | export OMP_NUM_THREADS=1 | + | >> % If state is finished, fetch the results |
+ | >> job.fetchOutputs{: | ||
- | time matlab < main.m | + | >> % Delete the job after results are no longer needed |
+ | >> job.delete | ||
</ | </ | ||
- | == main.m == | + | To retrieve a list of running or completed jobs, call parcluster to return the cluster object. |
< | < | ||
- | matlabpool | + | >> c = parcluster; |
+ | >> jobs = c.Jobs | ||
+ | >> | ||
+ | >> | ||
+ | >> job2 = c.Jobs(2); | ||
+ | </ | ||
- | matlabpool size | + | Once the job has been selected, fetch the results as previously done. |
- | n = 500; K = 50; L = 10; T = zeros(1,L); | + | fetchOutputs is used to retrieve function output arguments; if calling batch with a script, use load instead. |
+ | < | ||
+ | >> % Fetch all results from the second job in the list | ||
+ | >> job2.fetchOutputs{: | ||
+ | </ | ||
+ | |||
+ | ==== Parallel Batch Job ==== | ||
+ | batch can also submit parallel workflows. | ||
+ | |||
+ | < | ||
+ | function [sim_t, A] = parallel_example(iter) | ||
- | % serial inner loop | + | if nargin==0 |
- | for l = 1:L | + | |
- | tic | + | |
- | for k = 1:K | + | |
- | | + | |
end | end | ||
- | T(l) = toc; | + | |
+ | disp('Start sim') | ||
+ | |||
+ | t0 = tic; | ||
+ | parfor idx = 1:iter | ||
+ | A(idx) = idx; | ||
+ | pause(2) | ||
+ | idx | ||
end | end | ||
+ | sim_t = toc(t0); | ||
- | disp(['serial: | + | disp('Sim completed') |
+ | |||
+ | save RESULTS A | ||
- | % parallel inner loop | ||
- | for l = 1:L | ||
- | tic | ||
- | parfor k = 1:K | ||
- | eig(rand(n)); | ||
- | end | ||
- | T(l) = toc; | ||
end | end | ||
+ | </ | ||
- | disp([' | + | This time when using the batch command, also specify a MATLAB Pool argument. |
+ | < | ||
+ | >> % Get a handle to the cluster | ||
+ | >> c = parcluster; | ||
- | matlabpool close | + | >> % Submit a batch pool job using 4 workers for 16 simulations |
+ | >> job = c.batch(@parallel_example, | ||
+ | |||
+ | >> % View current job status | ||
+ | >> job.State | ||
+ | |||
+ | >> % Fetch the results after a finished state is retrieved | ||
+ | >> job.fetchOutputs{: | ||
+ | ans = | ||
+ | 8.8872 | ||
</ | </ | ||
+ | |||
+ | The job ran in 8.89 seconds using four workers. | ||
+ | |||
+ | Run the same simulation but increase the Pool size. This time, to retrieve the results later, keep track of the job ID. | ||
+ | |||
+ | **NOTE**: For some applications, | ||
< | < | ||
- | [username@l32]$ sbatch jobPool.sh | + | >> % Get a handle to the cluster |
- | [username@l32]$ squeue -u username | + | >> c = parcluster; |
+ | |||
+ | >> % Submit a batch pool job using 8 workers for 16 simulations | ||
+ | >> job = c.batch(@parallel_example, | ||
+ | |||
+ | >> % Get the job ID | ||
+ | >> id = job.ID | ||
+ | id = | ||
+ | 4 | ||
+ | >> % Clear job from workspace (as though MATLAB exited) | ||
+ | >> clear job | ||
</ | </ | ||
+ | |||
+ | With a handle to the cluster, the findJob method searches for the job with the specified job ID. | ||
+ | |||
+ | < | ||
+ | >> % Get a handle to the cluster | ||
+ | >> c = parcluster; | ||
+ | |||
+ | >> % Find the old job | ||
+ | >> job = c.findJob(' | ||
+ | |||
+ | >> % Retrieve the state of the job | ||
+ | >> job.State | ||
+ | ans = | ||
+ | finished | ||
+ | >> % Fetch the results | ||
+ | >> job.fetchOutputs{: | ||
+ | ans = | ||
+ | 4.7270 | ||
+ | </ | ||
+ | |||
+ | The job now runs in 4.73 seconds using eight workers. | ||
+ | |||
+ | Alternatively, | ||
+ | |||
+ | ==== Helper Functions ==== | ||
+ | |||
+ | | Function | ||
+ | | clusterFeatures | List of cluster features/ | ||
+ | | clusterGpuCards | ||
+ | | clusterPartitionNames | List of cluster partitions | ||
+ | | willRun | ||
+ | |||
+ | ==== Debugging ==== | ||
+ | |||
+ | If a serial job produces an error, call the getDebugLog method to view the error log file. When submitting an independent job, specify the task. | ||
+ | < | ||
+ | >> c.getDebugLog(job.Tasks) | ||
+ | </ | ||
+ | For Pool jobs, only specify the job object. | ||
+ | < | ||
+ | >> c.getDebugLog(job) | ||
+ | </ | ||
+ | |||
+ | When troubleshooting a job, the cluster admin may request the scheduler ID of the job. This can be derived by calling getTaskSchedulerIDs | ||
+ | < | ||
+ | >> job.getTaskSchedulerIDs() | ||
+ | ans = | ||
+ | 25539 | ||
+ | </ | ||
+ | |||
+ | ==== External Resources ==== | ||
+ | To learn more about the MATLAB Parallel Computing Toolbox, check out these resources: | ||
+ | |||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// |