SimGrid  3.18
Versatile Simulation of Distributed Systems
Visualization and Statistical Analysis

SimGrid comes with an extensive support to trace and register what happens during the simulation, so that it can be either visualized or statistically analysed after the simulation.

This tracing is widely used to observe and understand the behavior of parallel applications and distributed algorithms. Usually, this is done in a two-step fashion: the user instruments the application and the traces are analyzed after the end of the execution. The analysis can highlights unexpected behaviors, bottlenecks and sometimes can be used to correct distributed algorithms. The SimGrid team has instrumented the library in order to let users trace their simulations and analyze them. This part of the user manual explains how the tracing-related features can be enabled and used during the development of simulators using the SimGrid library.

Tracing categories functions

The SimGrid library is instrumented so users can trace the platform utilization using MSG, SimDAG and SMPI interfaces. It registers how much power is used for each host and how much bandwidth is used for each link of the platform. The idea with this type of tracing is to observe the overall view of resources utilization in the first place, especially the identification of bottlenecks, load-balancing among hosts, and so on.

Another possibility is to trace resource utilization by categories. Categorized resource utilization tracing gives SimGrid users to possibility to classify MSG and SimDAG tasks by category, tracing resource utilization for each of the categories. The functions below let the user declare a category and apply it to tasks. The tasks that are not classified according to a category are not traced. Even if the user does not specify any category, the simulations can still be traced in terms of resource utilization by using a special parameter that is detailed below (see section Tracing configuration Options).

Tracing marks functions

Tracing user variables functions

For hosts:

For links:

For links, but use source and destination to get route:

Tracing configuration Options

To check which tracing options are available for your simulator, you can just run it with the option

--help-tracing 

to get a very detailed and updated explanation of each tracing parameter. These are some of the options accepted by the tracing system of SimGrid, you can use them by running your simulator with the –cfg= switch:

  • tracing : Safe switch. It activates (or deactivates) the tracing system. No other tracing options take effect if this one is not activated.
    --cfg=tracing:yes
    
  • tracing/categorized : It activates the categorized resource utilization tracing. It should be enabled if tracing categories are used by this simulator.
    --cfg=tracing/categorized:yes
    
  • tracing/uncategorized : It activates the uncategorized resource utilization tracing. Use it if this simulator do not use tracing categories and resource use have to be traced.
    --cfg=tracing/uncategorized:yes
    
  • tracing/filename : A file with this name will be created to register the simulation. The file is in the Paje format and can be analyzed using Paje visualization tools. More information can be found in these webpages: http://github.com/schnorr/pajeng/
    --cfg=tracing/filename:mytracefile.trace
    
    If you do not provide this parameter, the trace file will be named simgrid.trace.
  • tracing/smpi : This option only has effect if this simulator is SMPI-based. Traces the MPI interface and generates a trace that can be analyzed using Gantt-like visualizations. Every MPI function (implemented by SMPI) is transformed in a state, and point-to-point communications can be analyzed with arrows.
    --cfg=tracing/smpi:yes
    
  • tracing/smpi/group : This option only has effect if this simulator is SMPI-based. The processes are grouped by the hosts where they were executed.
    --cfg=tracing/smpi/group:yes
    
  • tracing/smpi/computing : This option only has effect if this simulator is SMPI-based. The parts external to SMPI are also outputted to the trace. Provides better way to analyze the data automatically.
    --cfg=tracing/smpi/computing:yes
    
  • tracing/smpi/internals : This option only has effect if this simulator is SMPI-based. Display internal communications happening during a collective MPI call.
    --cfg=tracing/smpi/internals:yes
    
  • tracing/smpi/display-sizes : This option only has effect if this simulator is SMPI-based. Display the sizes of the messages exchanged in the trace, both in the links and on the states. For collective, size means the global size of data sent by the process in general.
    --cfg=tracing/smpi/display-sizes:yes
    
  • tracing/smpi/sleeping : TODO
    TODO
    
  • tracing/smpi/format : TODO
    TODO
    
  • tracing/smpi/format/ti-one-file : TODO
    TODO
    
  • tracing/msg/vm : TODO
    TODO
    
  • tracing/msg/process : This option only has effect if this simulator is MSG-based. It traces the behavior of all categorized MSG processes, grouping them by hosts. This option can be used to track process location if this simulator has process migration.
    --cfg=tracing/msg/process:yes
    
  • tracing/buffer : This option put some events in a time-ordered buffer using the insertion sort algorithm. The process of acquiring and releasing locks to access this buffer and the cost of the sorting algorithm make this process slow. The simulator performance can be severely impacted if this option is activated, but you are sure to get a trace file with events sorted.
    --cfg=tracing/buffer:yes
    
  • tracing/onelink-only : This option changes the way SimGrid register its platform on the trace file. Normally, the tracing considers all routes (no matter their size) on the platform file to re-create the resource topology. If this option is activated, only the routes with one link are used to register the topology within a netzone. Routes among netzones continue to be traced as usual.
    --cfg=tracing/onelink-only:yes
    
  • tracing/disable-link : TODO
    TODO
    
  • tracing/disable-power : TODO
    TODO
    
  • tracing/disable-destroy : Disable the destruction of containers at the end of simulation. This can be used with simulators that have a different notion of time (different from the simulated time).
    --cfg=tracing/disable-destroy:yes
    
  • tracing/basic : Some visualization tools are not able to parse correctly the Paje file format. Use this option if you are using one of these tools to visualize the simulation trace. Keep in mind that the trace might be incomplete, without all the information that would be registered otherwise.
    --cfg=tracing/basic:yes
    
  • tracing/comment : Use this to add a comment line to the top of the trace file.
    --cfg=tracing/comment:my_string
    
  • tracing/comment-file : Use this to add the contents of a file to the top of the trace file as comment.
    --cfg=tracing/comment-file:textual_file.txt
    
  • tracing/precision : This option determines the precision of timings stored in the trace file. Make sure you set Numerical precision of the platform models to at least the same value as this option! (Traces cannot be more accurate than the simulation; they can be less accurate, though.)

The following example will give you a precision of E-10 in the trace file:

--cfg=tracing/precision:10
  • tracing/platform : TODO
    TODO
    
  • tracing/platform/topology : TODO
    TODO
    

Please pass

--help-tracing 

to your simulator for the updated list of tracing options.

Case studies

Some scenarios that might help you decide which tracing options you should use to analyze your simulator.

  • I want to trace the resource utilization of all hosts and links of the platform, and my simulator does not use the tracing API. For that, you can run a uncategorized trace with the following parameters (it will work with any Simgrid simulator):
    ./your_simulator \
              --cfg=tracing:yes \
              --cfg=tracing/uncategorized:yes \
              --cfg=tracing/filename:mytracefile.trace \
    
  • I want to trace only a subset of my MSG (or SimDAG) tasks. For that, you will need to create tracing categories using the TRACE_category (...) function (as explained above), and then classify your tasks to a previously declared category using the MSG_task_set_category (...) (or SD_task_set_category (...) for SimDAG tasks). After recompiling, run your simulator with the following parameters:
    ./your_simulator \
              --cfg=tracing:yes \
              --cfg=tracing/categorized:yes \
              --cfg=tracing/filename:mytracefile.trace \
    

Example of Instrumentation

A simplified example using the tracing mandatory functions.

int main (int argc, char **argv)
{
  MSG_init (&argc, &argv);

  //(... after deployment ...)

  //note that category declaration must be called after MSG_create_environment
  TRACE_category_with_color ("request", "1 0 0");
  TRACE_category_with_color ("computation", "0.3 1 0.4");
  TRACE_category ("finalize");

  msg_task_t req1 = MSG_task_create("1st_request_task", 10, 10, NULL);
  msg_task_t req2 = MSG_task_create("2nd_request_task", 10, 10, NULL);
  msg_task_t req3 = MSG_task_create("3rd_request_task", 10, 10, NULL);
  msg_task_t req4 = MSG_task_create("4th_request_task", 10, 10, NULL);
  MSG_task_set_category (req1, "request");
  MSG_task_set_category (req2, "request");
  MSG_task_set_category (req3, "request");
  MSG_task_set_category (req4, "request");

  msg_task_t comp = MSG_task_create ("comp_task", 100, 100, NULL);
  MSG_task_set_category (comp, "computation");

  msg_task_t finalize = MSG_task_create ("finalize", 0, 0, NULL);
  MSG_task_set_category (finalize, "finalize");

  //(...)

  MSG_clean();
  return 0;
}

Analyzing SimGrid Simulation Traces

A SimGrid-based simulator, when executed with the correct parameters (see above) creates a trace file in the Paje file format holding the simulated behavior of the application or the platform. You have several options to analyze this trace file:

  • Dump its contents to a CSV-like format using pj_dump (see PajeNG's wiki on pj_dump and more generally the PajeNG suite) and use gnuplot to plot resource usage, time spent on blocking/executing functions, and so on. Filtering capabilities are at your hand by doing grep, with the best regular expression you can provide, to get only parts of the trace (for instance, only a subset of resources or processes).
  • Derive statistics from trace metrics (the ones built-in with any SimGrid simulation, but also those metrics you injected in the trace using the TRACE module) using the R project and all its modules. You can also combine R with ggplot2 to get a number of high quality plots from your simulation metrics. You need to pj_dump the contents of the SimGrid trace file to use R.
  • Visualize the behavior of your simulation using classic space/time views (gantt-charts) provided by the PajeNG suite and any other tool that supports the Paje file format. Consider this option if you need to understand the causality of your distributed simulation.
  • You can also check our online tutorial section that contains a dedicated tutorial with several suggestions on how to use the tracing infrastructure. Look for the SimGrid User::Visualization 101 tutorial.
  • Ask for help on the simgrid-user@lists.gforge.inria.fr mailing list, giving us a detailed explanation on what your simulator does and what kind of information you want to trace. You can also check the mailing list archive for old messages regarding tracing and analysis.