Command:mpirun -n 6 --bind-to-core ./mmult3_c.exe 4608
Resources:1 node (12 physical, 24 logical cores per node)
Tasks:6 processes
Machine:mic2
Start time:Fri Feb 20 21:46:04 2015
Total time:118 seconds (2 minutes)
Full path:/home/allinea/mmult/3_fix
Input file:
Notes:

Error: javascript is not running

The graphs in this Performance Report require javascript, which is disabled or not working.

Check whether your javascript support is enabled or try another browser.

Remember, you can always contact support@allinea.com, we're very nice!

Summary: mmult3_c.exe is MPI-bound in this configuration
CPU37.5%

Time spent running application code. High values are usually good.

This is low; it may be worth improving MPI or I/O performance first.

MPI53.7%

Time spent in MPI calls. High values are usually bad.

This is high; check the MPI breakdown for advice on reducing it.

I/O8.8%

Time spent in filesystem I/O. High values are usually bad.

This is low; check the I/O breakdown section for optimization advice.

This application run was MPI-bound. A breakdown of this time and advice for investigating further is in the MPI section below.


CPU
A breakdown of the 37.5% CPU time:
Scalar numeric ops16.3%
Vector numeric ops10.1%
Memory accesses73.6%
The per-core performance is memory-bound. Use a profiler to identify time-consuming loops and check their cache performance.
Little time is spent in vectorized instructions. Check the compiler's vectorization advice to see why key loops could not be vectorized.
MPI
A breakdown of the 53.7% MPI time:
Time in collective calls97.5%
Time in point-to-point calls2.5%
Effective process collective rate0.00e+00 
Effective process point-to-point rate4.62e+08 
Most of the time is spent in collective calls with a very low transfer rate. This suggests load imbalance is causing synchonization overhead; use an MPI profiler to investigate further.
I/O
A breakdown of the 8.8% I/O time:
Time in reads0.0%
Time in writes100.0%
Effective process read rate0.00e+00 
Effective process write rate4.07e+06 
Most of the time is spent in write operations with a very low effective transfer rate. This may be caused by contention for the filesystem or inefficient access patterns. Use an I/O profiler to investigate which write calls are affected.
Threads
A breakdown of how multiple threads were used:
Computation0.0%
Synchronization0.0%
Physical core utilization45.6%
Involuntary context switches per second6606.9
No measurable time is spent in multithreaded code.
Memory
Per-process memory usage may also affect scaling:
Mean process memory usage1.98e+08 
Peak process memory usage5.55e+08 
Peak node memory usage14.0%
There is significant variation between peak and mean memory usage. This may be a sign of workload imbalance or a memory leak.
The peak node memory usage is very low. Running with fewer MPI processes and more data on each process may be more efficient.