SimGrid  3.17
Versatile Simulation of Distributed Systems
Describing the virtual platform

As explained in the introduction, any SimGrid study must entail the description of the platform on which you want to simulate your application. You have to describe each element of your platform, such as computing hosts, clusters, each disks, links, etc. You must also define the routing on your platform, ie which path is taken between two hosts. Finally, you may also describe an experimental scenario, with qualitative changes (e.g., bandwidth changes representing an external load) and qualitative changes (representing how some elements fail and restart over time).

You should really separate your application from the platform description, as it will ease your experimental campain afterward. Mixing them is seen as a really bad experimental practice. The easiest to enforce this split is to put the platform description in a XML file. Many example platforms are provided in the archive, and this page gives all needed details to write such files, as well as some hints and tricks about describing your platform.

On the other side, XML is sometimes not expressive enough for some platforms, in particular large platforms exhibiting repetitive patterns that are not simply expressed in XML. In practice, many users end up generating their XML platform files from some sort of scripts. It is probably preferable to rewrite your XML platform using the lua scripting language instead. In the future, it should be possible to describe the platform directly in C++, but this is not possible yet.

As usual, SimGrid is a versatile framework, and you should find the way of describing your platform that best fits your experimental practice.

Describing the platform with XML

Your platform description should follow the specification presented in the simgrid.dtd DTD file. The same DTD is used for both the platform and deployment files.

From time to time, this DTD evolves to introduce possibly backward-incompatible changes. That is why each platform desciption is enclosed within a platform tag, that have a version attribute. The current version is 4.1. The simgrid_update_xml program can upgrade most of the past platform files to the recent formalism.

First Platform Example

Here is a very simple platform file, containing 3 resources (two hosts and one link), and explicitly giving the route between the hosts.

<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
<platform version="4.1">
<zone id="first zone" routing="Full">
<!-- the resources -->
<host id="host1" speed="1Mf"/>
<host id="host2" speed="2Mf"/>
<link id="link1" bandwidth="125MBps" latency="100us"/>
<!-- the routing: specify how the hosts are interconnected -->
<route src="host1" dst="host2">
<link_ctn id="link1"/>
</route>
</zone>
</platform>

As we said, the englobing <platform> tag is used to specify the dtd version used for this file.

Then, every resource (specified with <host>, <link> or others) must be located within a given networking zone. Each zone is in charge of the routing between its resources. It means that when an host wants to communicate with another host of the same zone, it is the zone's duty to find the list of links that are involved in the communication. Here, since the <zone> tag has Full as a routing attribute, all routes must be explicitely given using the <route> and <link_ctn> tags (this routing model is both simple and inefficient :) It is OK to not specify the route between two hosts, as long as the processes located on them never try to communicate together.

A zone can contain several zones itself, leading to a hierarchical decomposition of the platform. This can be more efficient (as the inter-zone routing gets factorized with <zoneRoute>), and allows to have more than one routing model in your platform. For example, you could have a coordinate-based routing for the WAN parts of your platforms, a full routing within each datacenter, and a highly optimized routing within each cluster of the datacenter. In this case, determining the route between two given hosts gets somewhat more complex but SimGrid still computes these routes for you in a time- and space-efficient manner. Here is an illustration of these concepts:

AS_hierarchy.png
A hierarchy of networking zones.

Circles represent processing units and squares represent network routers. Bold lines represent communication links. The zone "AS2" models the core of a national network interconnecting a small flat cluster (AS4) and a larger hierarchical cluster (AS5), a subset of a LAN (AS6), and a set of peers scattered around the world (AS7).

Resource description

Computing Resources

<host>

An host is the computing resource on which an actor can execute.

Attribute Values Description
id String (mandatory) The identifier of the host. facilitates referring to this AS.
speed double (mandatory) Computational power of every core of this host in FLOPS (must be positive)
core int (defaults to 1) Number of cores (see How to model multicore machines)
state optionally "OFF" If set to OFF, the host is initially turned off.
availability_file File name (optional) (Relative or absolute) filename to use as input; must contain availability traces for this host. The syntax of this file is defined below.
state_file File name (optional) File to use as a state profile (see How to model churn)
coordinates String (mandatory when using Vivaldi routing) The coordinates of this host (see P2P or how to use coordinates).
pstate Double (Defaults to 0) FIXME: Not yet documented.

Included tags

Examples

<host id="host1" speed="1000000000"/>
<host id="host2" speed="1000000000">
<prop id="color" value="blue"/>
<prop id="rendershape" value="square"/>
</host>

Expressing dynamism

SimGrid provides mechanisms to change a hosts' availability over time, using the availability_file attribute to the \<host\> tag and a separate text file whose syntax is exemplified below.

Adding a trace file

<platform version="4">
  <host id="bob" speed="500Gf" availability_file="bob.trace" />
</platform>

Example of "bob.trace" file

PERIODICITY 1.0
0.0 1.0
11.0 0.5
20.0 0.8

Let us begin to explain this example by looking at line 2. (Line 1 will become clear soon). The first column describes points in time, in this case, time 0. The second column describes the relative amount of power this host is able to deliver (relative to the maximum performance specified in the \<host\> tag). (Clearly, the second column needs to contain values that are not smaller than 0 and not larger than 1). In this example, our host will deliver 500 Mflop/s at time 0, as 500 Mflop/s is the maximum performance of this host. At time 11.0, it will deliver half of its maximum performance, i.e., 250 Mflop/s until time 20.0 when it will will start delivering 80% of its power. In this example, this amounts to 400 Mflop/s.

Since the periodicity in line 1 was set to be 1.0, i.e., 1 timestep, this host will continue to provide 500 Mflop/s from time 21. From time 32 it will provide 250 MFlop/s and so on.

<cluster>

<cluster /> represents a machine-cluster. It is most commonly used when one wants to define many hosts and a network quickly. Technically, cluster is a meta-tag: from the inner SimGrid point of view, a cluster is an AS where some optimized routing is defined. The default inner organization of the cluster is as follow:

                 __________
                |          |
                |  router  |
    ____________|__________|_____________ backbone
      |   |   |              |     |   |
    l0| l1| l2|           l97| l96 |   | l99
      |   |   |   ........   |     |   |
      |                                |
    c-0.me                             c-99.me

Here, a set of hosts is defined. Each of them has a link to a central backbone (backbone is a link itself, as a link can be used to represent a switch, see the switch / link section below for more details about it). A router allows to connect a cluster to the outside world. Internally, SimGrid treats a cluster as an AS containing all hosts: the router is the default gateway for the cluster.

There is an alternative organization, which is as follows:

                 __________
                |          |
                |  router  |
                |__________|
                    / | \
                   /  |  \
               l0 / l1|   \l2
                 /    |    \
                /     |     \
            host0   host1   host2

The principle is the same, except that there is no backbone. This representation can be obtained easily: just do not set the bb_* attributes.

Attribute name Mandatory Values Description
id yes string The identifier of the cluster. Facilitates referring to this cluster.
prefix yes string Each node of the cluster has to have a name. This name will be prefixed with this prefix.
suffix yes string Each node of the cluster will be suffixed with this suffix
radical yes string Regexp used to generate cluster nodes name. Syntax: "10-20" will give you 11 machines numbered from 10 to 20, "10-20;2" will give you 12 machines, one with the number 2, others numbered as before. The produced number is concatenated between prefix and suffix to form machine names.
speed yes int Same as the speed attribute of the \<host\> tag.
core no int (default: 1) Same as the core attribute of the \<host\> tag.
bw yes int Bandwidth for the links between nodes and backbone (if any). See the link section for syntax/details.
lat yes int Latency for the links between nodes and backbone (if any). See link section for syntax/details.
sharing_policy no string Sharing policy for the links between nodes and backbone (if any). See link section for syntax/details.
bb_bw no int Bandwidth for backbone (if any). See link section for syntax/details. If bb_bw and bb_lat (see below) attributes are omitted, no backbone is created (alternative cluster architecture described before).
bb_lat no int Latency for backbone (if any). See link section for syntax/details. If bb_lat and bb_bw (see above) attributes are omitted, no backbone is created (alternative cluster architecture described before).
bb_sharing_policy no string Sharing policy for the backbone (if any). See link section for syntax/details.
limiter_link no int Bandwidth for limiter link (if any). This adds a specific link for each node, to set the maximum bandwidth reached when communicating in both directions at the same time. In theory this value should be 2*bw for fullduplex links, but in reality this might be less. This value will depend heavily on the communication model, and on the cluster's hardware, so no default value can be set, this has to be measured. More details can be obtained in "Toward Better Simulation of MPI Applications on Ethernet/TCP Networks"
loopback_bw no int Bandwidth for loopback (if any). See link section for syntax/details. If loopback_bw and loopback_lat (see below) attributes are omitted, no loopback link is created and all intra-node communication will use the main network link of the node. Loopback link is a FATPIPE .
loopback_lat no int Latency for loopback (if any). See link section for syntax/details. See loopback_bw for more info.
topology no FLAT|TORUS|FAT_TREE|DRAGONFLY (default: FLAT) Network topology to use. SimGrid currently supports FLAT (with or without backbone, as described before), TORUS , FAT_TREE, and DRAGONFLY attributes for this tag.
topo_parameters no string Specific parameters to pass for the topology defined in the topology tag. For torus networks, comma-separated list of the number of nodes in each dimension of the torus. Please refer to the specific documentation for FatTree NetZone, Dragonfly NetZone.

the router name is defined as the resulting String in the following java line of code:

router_name = prefix + clusterId + "_router" + suffix;

Cluster example

Consider the following two (and independent) uses of the cluster tag:

<cluster id="my_cluster_1" prefix="" suffix="" radical="0-262144"
         speed="1e9" bw="125e6" lat="5E-5"/>

<cluster id="my_cluster_2" prefix="c-" suffix=".me" radical="0-99"
         speed="1e9" bw="125e6" lat="5E-5"
         bb_bw="2.25e9" bb_lat="5E-4"/>

The second example creates one router and 100 machines with the following names:

c-my_cluster_2_router.me
c-0.me
c-1.me
c-2.me
...
c-99.me

<cabinet>

Note
This tag is only available when the routing mode of the AS is set to Cluster.

The <cabinet /> tag is, like the <cluster> tag, a meta-tag. This means that it is simply a shortcut for creating a set of (homogenous) hosts and links quickly; unsurprisingly, this tag was introduced to setup cabinets in data centers quickly. Unlike <cluster>, however, the <cabinet> assumes that you create the backbone and routers yourself; see our examples below.

Attributes

Attribute name Mandatory Values Description
id yes string The identifier of the cabinet. Facilitates referring to this cluster.
prefix yes string Each node of the cabinet has to have a name. This name will be prefixed with this prefix.
suffix yes string Each node of the cabinet will be suffixed with this suffix
radical yes string Regexp used to generate cabinet nodes name. Syntax: "10-20" will give you 11 machines numbered from 10 to 20, "10-20;2" will give you 12 machines, one with the number 2, others numbered as before. The produced number is concatenated between prefix and suffix to form machine names.
speed yes int Same as the speed attribute of the <host> tag.
bw yes int Bandwidth for the links between nodes and backbone (if any). See the link section for syntax/details.
lat yes int Latency for the links between nodes and backbone (if any). See the link section for syntax/details.
Note
Please note that as of now, it is impossible to change attributes such as, amount of cores (always set to 1), the initial state of hosts/links (always set to ON), the sharing policy of the links (always set to FULLDUPLEX).

Example

The following example was taken from examples/platforms/meta_cluster.xml and shows how to use the cabinet tag.

  <AS  id="my_cluster1"  routing="Cluster">
    <cabinet id="cabinet1" prefix="host-" suffix=".cluster1"
      speed="1Gf" bw="125MBps" lat="100us" radical="1-10"/>
    <cabinet id="cabinet2" prefix="host-" suffix=".cluster1"
      speed="1Gf" bw="125MBps" lat="100us" radical="11-20"/>
    <cabinet id="cabinet3" prefix="host-" suffix=".cluster1"
      speed="1Gf" bw="125MBps" lat="100us" radical="21-30"/>

    <backbone id="backbone1" bandwidth="2.25GBps" latency="500us"/>
  </AS>
Note
Please note that you must specify the <backbone> tag by yourself; this is not done automatically and there are no checks that ensure this backbone was defined.

The hosts generated in the above example are named host-1.cluster, host-2.cluster1 etc.

<peer> (Vivaldi netzones only)

This tag represents a peer, as in Peer-to-Peer (P2P) networks. This can only be used in Vivaldi NetZones. It creates the following resources to the NetZone:

  • A host
  • Two links: One for download and one for upload. This is convenient to use and simulate stuff under the last mile model (e.g., ADSL peers).
  • It connects the two links to the host

Attributes

Attribute name Mandatory Values Description
id yes string The identifier of the peer. Facilitates referring to this peer.
speed yes int See the description of the host tag for this attribute
bw_in yes int Bandwidth of the private downstream link
bw_out yes int Bandwidth of the private upstream link
coordinates no string Coordinates of the gateway for this peer. Example value: 12.8 14.4 6.4
sharing_policy no SHARED|FULLDUPLEX (default: FULLDUPLEX) Sharing policy for links. See link description for details.
availability_fileno string Availability file for the peer. Same as host availability file. See host description for details.
state_file no string State file for the peer. Same as host state file. See host description for details.

The communication latency between an host A=(xA,yA,zA) and an host B=(xB,yB,zB) is computed as follows:

latency = sqrt( (xA-xB)² + (yA-yB)² ) + zA + zB

See the documentation of simgrid::kernel::routing::VivaldiZone for details on how the latency is computed from the coordinate, and on the the up and down bandwidth are used.

Network equipments

There are two tags at all times available to represent network entities and several other tags that are available only in certain contexts.

  1. <link>: Represents a entity that has a limited bandwidth, a latency, and that can be shared according to TCP way to share this bandwidth.
    Remarks
    The concept of links in SimGrid may not be intuitive, as links are not limited to connecting (exactly) two entities; in fact, you can have more than two equipments connected to it. (In graph theoretical terms: A link in SimGrid is not an edge, but a hyperedge)
  2. <router/>: Represents an entity that a message can be routed to, but that is unable to execute any code. In SimGrid, routers have also no impact on the performance: Routers do not limit any bandwidth nor do they increase latency. As a matter of fact, routers are (almost) ignored by the simulator when the simulation has begun.
  3. <backbone/>: This tag is only available when the containing AS is used as a cluster (i.e., mode="Cluster")
Remarks
If you want to represent an entity like a switch, you must use <link> (see section). Routers are used to run some routing algorithm and determine routes (see Section Routing for details).

<router/>

As said before, router is used only to give some information for routing algorithms. So, it does not have any attributes except :

Attributes

Attribute name Mandatory Values Description
id yes string The identifier of the router to be used when referring to it.
coordinates no string Must be provided when choosing the Vivaldi, coordinate-based routing model for the AS the router belongs to. More details can be found in the Section P2P or how to use coordinates.

Example

 <router id="gw_dc1_horizdist"/>

<link>

Network links can represent one-hop network connections. They are characterized by their id and their bandwidth; links can (but may not) be subject to latency.

Attributes

Attribute name Mandatory Values Description
id yes string The identifier of the link to be used when referring to it.
bandwidth yes int Maximum bandwidth for this link, given in bytes/s
latency no double (default: 0.0) Latency for this link.
sharing_policy no SHARED|FATPIPE|FULLDUPLEX (default: SHARED) Sharing policy for the link.
state no ON|OFF (default: ON) Allows you to to turn this link on or off (working / not working)
bandwidth_file no string Allows you to use a file as input for bandwidth.
latency_file no string Allows you to use a file as input for latency.
state_file no string Allows you to use a file as input for states.

Possible shortcuts for latency

When using the latency attribute, you can specify the latency by using the scientific notation or by using common abbreviations. For instance, the following three tags are equivalent:

 <link id="LINK1" bandwidth="125000000" latency="5E-6"/>
 <link id="LINK1" bandwidth="125000000" latency="5us"/>
 <link id="LINK1" bandwidth="125000000" latency="0.000005"/>

Here, the second tag uses "us", meaning "microseconds". Other shortcuts are:

Name Abbreviation Time (in seconds)
Week w 7 * 24 * 60 * 60
Day d 24 * 60 * 60
Hour h 60 * 60
Minute m 60
Second s 1
Millisecond ms 0.001 = 10^(-3)
Microsecond us 0.000001 = 10^(-6)
Nanosecond ns 0.000000001 = 10^(-9)
Picosecond ps 0.000000000001 = 10^(-12)

Sharing policy

By default a network link is SHARED, i.e., if two or more data flows go through a link, the bandwidth is shared fairly among all data flows. This is similar to the sharing policy TCP uses.

On the other hand, if a link is defined as a FATPIPE, each flow going through this link will be provided with the complete bandwidth, i.e., no sharing occurs and the bandwidth is only limiting each flow individually. Please note that this is really on a per-flow basis, not only on a per-host basis! The complete bandwidth provided by this link in this mode is number_of_flows*bandwidth, with at most bandwidth being available per flow.

Using the FATPIPE mode allows to model backbones that won't affect performance (except latency).

The last mode available is FULLDUPLEX. This means that SimGrid will automatically generate two links (one carrying the suffix _UP and the other the suffix _DOWN) for each <link> tag. This models situations when the direction of traffic is important.

Remarks
Transfers from one side to the other will interact similarly as TCP when ACK returning packets circulate on the other direction. More discussion about it is available in the description of link_ctn description.

In other words: The SHARED policy defines a physical limit for the bandwidth. The FATPIPE mode defines a limit for each application, with no upper total limit.

Remarks
Tip: By using the FATPIPE mode, you can model big backbones that won't affect performance (except latency).

Example

 <link id="SWITCH" bandwidth="125000000" latency="5E-5" sharing_policy="FATPIPE" />

Expressing dynamism and failures

Similar to hosts, it is possible to declare links whose state, bandwidth or latency changes over time (see Section pf_host_dynamism for details).

In the case of network links, the bandwidth and latency attributes are replaced by the bandwidth_file and latency_file attributes. The following XML snippet demonstrates how to use this feature in the platform file. The structure of the files "link1.bw" and "link1.lat" is shown below.

<link id="LINK1" state_file="link1.fail" bandwidth="80000000" latency=".0001" bandwidth_file="link1.bw" latency_file="link1.lat" />
Note
Even if the syntax is the same, the semantic of bandwidth and latency trace files differs from that of host availability files. For bandwidth and latency, the corresponding files do not express availability as a fraction of the available capacity but directly in bytes per seconds for the bandwidth and in seconds for the latency. This is because most tools allowing to capture traces on real platforms (such as NWS) express their results this way.
Example of "link1.bw" file
PERIODICITY 12.0
4.0 40000000
8.0 60000000

In this example, the bandwidth changes repeatedly, with all changes being repeated every 12 seconds.

At the beginning of the the simulation, the link's bandwidth is 80,000,000 B/s (i.e., 80 Mb/s); this value was defined in the XML snippet above. After four seconds, it drops to 40 Mb/s (line 2), and climbs back to 60 Mb/s after another 4 seconds (line 3). The value does not change any more until the end of the period, that is, after 12 seconds have been simulated). At this point, periodicity kicks in and this behavior is repeated: Seconds 12-16 will experience 80 Mb/s, 16-20 40 Mb/s etc.).

Example of "link1.lat" file
PERIODICITY 5.0
1.0 0.001
2.0 0.01
3.0 0.001

In this example, the latency varies with a period of 5 seconds. In the xml snippet above, the latency is initialized to be 0.0001s (100µs). This value will be kept during the first second, since the latency_file contains changes to this value at second one, two and three. At second one, the value will be 0.001, i.e., 1ms. One second later it will be adjusted to 0.01 (or 10ms) and one second later it will be set again to 1ms. The value will not change until second 5, when the periodicity defined in line 1 kicks in. It then loops back, starting at 100µs (the initial value) for one second.

The <prop/> tag

Similar to the <host> tag, a link may also contain the <prop/> tag; see the host documentation (Section pf_host) for an example.

<backbone/>

Note
This tag is only available when the containing AS uses the "Cluster" routing mode!

Using this tag, you can designate an already existing link to be a backbone.

Attribute name Mandatory Values Description
id yes string Name of the link that is supposed to act as a backbone.

Storage

Note
This is a prototype version that should evolve quickly, hence this is just some doc valuable only at the time of writing. This section describes the storage management under SimGrid ; nowadays it's only usable with MSG. It relies basically on linux-like concepts. You also may want to have a look to its corresponding section in File Management Functions ; access functions are organized as a POSIX-like interface.

Storage - Main Concepts

The storage facilities implemented in SimGrid help to model (and account for) storage devices, such as tapes, hard-drives, CD or DVD devices etc. A typical situation is depicted in the figure below:

storage_sample_scenario.png

In this figure, two hosts called Bob and Alice are interconnected via a network and each host is physically attached to a disk; it is not only possible for each host to mount the disk they are attached to directly, but they can also mount disks that are in a remote location. In this example, Bob mounts Alice's disk remotely and accesses the storage via the network.

SimGrid provides 3 different entities that can be used to model setups that include storage facilities:

Entity name Description
storage_type Defines a template for a particular kind of storage (such as a hard-drive) and specifies important features of the storage, such as capacity, performance (read/write), contents, ... Different models of hard-drives use different storage_types (because the difference between an SSD and an HDD does matter), as they differ in some specifications (e.g., different sizes or read/write performance).
storage Defines an actual instance of a storage type (disk, RAM, ...); uses a storage_type template (see line above) so that you don't need to re-specify the same details over and over again.
mount Must be wrapped by a <host> tag; declares which storage(s) this host has mounted and where (i.e., the mountpoint).

Storage Content File

In order to assess exactly how much time is spent reading from the storage, SimGrid needs to know what is stored on the storage device (identified by distinct (file-)name, like in a file system) and what size this content has.

Note
The content file is never changed by the simulation; it is parsed once per simulation and kept in memory afterwards. When the content of the storage changes, only the internal SimGrid data structures change.

Structure of a Storage Content File

Here is an excerpt from two storage content file; if you want to see the whole file, check the file examples/platforms/content/storage_content.txt that comes with the SimGrid source code.

SimGrid essentially supports two different formats: UNIX-style filepaths should follow the well known format:

/lib/libsimgrid.so.3.6.2  12710497
/bin/smpicc  918
/bin/smpirun  7292
/bin/smpif2c  1990
/bin/simgrid_update_xml  5018
/bin/graphicator  66986
/bin/simgrid-colorizer  2993
/bin/smpiff  820
/bin/tesh  356434

Windows filepaths, unsurprisingly, use the windows style:

\Windows\avastSS.scr 41664
\Windows\bfsvc.exe 75264
\Windows\bootstat.dat 67584
\Windows\CoreSingleLanguage.xml 31497
\Windows\csup.txt 12
\Windows\dchcfg64.exe 335464
\Windows\dcmdev64.exe 93288
Note
The different file formats come at a cost; in version 3.12 (and most likely in later versions, too), copying files from windows-style storages to unix-style storages (and vice versa) is not supported.

Generate a Storage Content File

If you want to generate a storage content file based on your own filesystem (or at least a filesystem you have access to), try running this command (works only on unix systems):

find . -type f -exec ls -1s --block=1 {} \; 2>/dev/null | awk '{ print $2 " " $1}' > ./content.txt

The Storage Entities

These are the entities that you can use in your platform files to include storage in your model. See also the list of our example files; these might also help you to get started.

<storage_type>

Attribute name Mandatory Values Description
id yes string Identifier of this storage_type; used when referring to it
model no string In the future, this will allow to change the performance model to use
size yes string Specifies the amount of available storage space; you can specify storage like "500GiB" or "500GB" if you want. (TODO add a link to all the available abbreviations)
content yes string Path to a Storage Content File on your system. This file must exist.

This tag must contain some predefined model properties, specified via the <model_prop> tag. Here is a list, see below for an example:

Property id Mandatory Values Description
Bwrite yes string Bandwidth for write access; in B/s (but you can also specify e.g. "30MBps")
Bread yes string Bandwidth for read access; in B/s (but you can also specify e.g. "30MBps")
Note
A storage_type can also contain the <prop> tag. The <prop> tag allows you to associate additional information to this <storage_type> and follows the attribute/value schema; see the example below. You may want to use it to give information to the tool you use for rendering your simulation, for example.

Here is a complete example for the storage_type tag:

<storage_type id="single_HDD" size="4000">
  <model_prop id="Bwrite" value="30MBps" />
  <model_prop id="Bread" value="100MBps" />
  <prop id="Brand" value="Western Digital" />
</storage_type>

<storage>

Attributes Mandatory Values Description
id yes string Identifier of this storage; used when referring to it
typeId yes string Here you need to refer to an already existing <storage_type>; the storage entity defined by this tag will then inherit the properties defined there.
attach yes string Name of a host (see Section pf_host) to which this storage is physically attached to (e.g., a hard drive in a computer)
content no string When specified, overwrites the content attribute of <storage_type>

Here are two examples:

     <storage id="Disk1" typeId="single_HDD" attach="bob" />

     <storage id="Disk2" typeId="single_SSD"
              content="content/win_storage_content.txt" />

The first example is straightforward: A disk is defined and called "Disk1"; it is of type "single_HDD" (shown as an example of <storage_type> above) and attached to a host called "bob" (the definition of this host is omitted here).

The second storage is called "Disk2", is still of the same type as Disk1 but now specifies a new content file (so the contents will be different from Disk1) and the filesystem uses the windows style; finally, it is attached to a second host, called alice (which is again not defined here).

<mount>

Attribute Mandatory Values Description
id yes string Refers to a <storage> entity that will be mounted on that computer
name yes string Path/location to/of the logical reference (mount point) of this disk

This tag must be enclosed by a <host> tag. It then specifies where the mountpoint of a given storage device (defined by the id attribute) is; this location is specified by the name attribute.

Here is a simple example, taken from the file examples/platform/storage.xml:

    <storage_type id="single_SSD" size="500GiB">
       <model_prop id="Bwrite" value="60MBps" />
       <model_prop id="Bread" value="200MBps" />
    </storage_type>

    <storage id="Disk2" typeId="single_SSD"
              content="content/win_storage_content.txt"
              attach="alice" />
    <storage id="Disk4" typeId="single_SSD"
             content="content/small_content.txt"
             attach="denise"/>

    <host id="alice" speed="1Gf">
      <mount storageId="Disk2" name="c:"/>
    </host>

    <host id="denise" speed="1Gf">
      <mount storageId="Disk2" name="c:"/>
      <mount storageId="Disk4" name="/home"/>
    </host>

This example is quite interesting, as the same device, called "Disk2", is mounted by two hosts at the same time! Note, however, that the host called alice is actually attached to this storage, as can be seen in the <storage> tag. This means that denise must access this storage through the network, but SimGrid automatically takes care of that for you.

Furthermore, this example shows that denise has mounted two storages with different filesystem types (unix and windows). In general, a host can mount as many storage devices as required.

Note
Again, the difference between attach and mount is simply that an attached storage is always physically inside (or connected to) that machine; for instance, a USB stick is attached to one and only one machine (where it's plugged-in) but it can only be mounted on others, as mounted storage can also be a remote location.
Example files
examples/platforms/storage/remote_io.xml
examples/platforms/storage/storage.xml

Example files

Several examples were already discussed above; if you're interested in full examples, check the the following platforms:

  1. examples/platforms/storage.xml
  2. examples/platforms/remote_io.xml

If you're looking for some examplary C code, you may find the source code available in the directory examples/msg/io/ useful.

Modelling different situations

The storage functionality of SimGrid is type-agnostic, that is, the implementation does not presume any type of storage, such as HDDs/SSDs, RAM, CD/DVD devices, USB sticks etc.

This allows the user to apply the simulator for a wide variety of scenarios; one common scenario would be the access of remote RAM.

Modelling the access of remote RAM

How can this be achieved in SimGrid? Let's assume we have a setup where three hosts (HostA, HostB, HostC) need to access remote RAM:

      Host A
    /
RAM -- Host B
    \
      Host C

An easy way to model this scenario is to setup and define the RAM via the storage and storage type entities and attach it to a remote dummy host; then, every host can have their own links to this host (modelling for instance certain scenarios, such as PCIe ...)

              Host A
            /
RAM - Dummy -- Host B
            \
              Host C

Now, if read from this storage, the host that mounts this storage communicates to the dummy host which reads from RAM and sends the information back.

Routing

To achieve high performance, the routing tables used within SimGrid are static. This means that routing between two nodes is calculated once and will not change during execution. The SimGrid team chose to use this approach as it is rare to have a real deficiency of a resource; most of the time, a communication fails because the links experience too much congestion and hence, your connection stops before the timeout or because the computer designated to be the destination of that message is not responding.

We also chose to use shortest paths algorithms in order to emulate routing. Doing so is consistent with the reality: RIP, OSPF, BGP are all calculating shortest paths. They do require some time to converge, but eventually, when the routing tables have stabilized, your packets will follow the shortest paths.

<zone>

Before SimGrid v3.16, networking zones used to be called Autonomous Systems, but this was misleading as zones may include other zones in a hierarchical manner. If you find any remaining reference to ASes, please report this as a bug.

Attribute Value Description
id String (mandatory) The identifier of this zone (must be unique)
routing One of the existing routing algorithm (mandatory) See Section Routing models for details.

Example:

<AS id="AS0" routing="Full">
<host id="host1" speed="1000000000"/>
<host id="host2" speed="1000000000"/>
<link id="link1" bandwidth="125000000" latency="0.000100"/>
<route src="host1" dst="host2"><link_ctn id="link1"/></route>
</AS>

In this example, AS0 contains two hosts (host1 and host2). The route between the hosts goes through link1.

Routing models

For each AS, you must define explicitly which routing model will be used. There are 3 different categories for routing models:

  1. Shortest-path based models: SimGrid calculates shortest paths and manages them. Behaves more or less like most real life routing mechanisms.
  2. Manually-entered route models: you have to define all routes manually in the platform description file; this can become tedious very quickly, as it is very verbose. Consistent with some manually managed real life routing.
  3. Simple/fast models: those models offer fast, low memory routing algorithms. You should consider to use this type of model if you can make some assumptions about your AS. Routing in this case is more or less ignored.

The router affair

Using routers becomes mandatory when using shortest-path based models or when using the bindings to the ns-3 packet-level simulator instead of the native analytical network model implemented in SimGrid.

For graph-based shortest path algorithms, routers are mandatory, because these algorithms require a graph as input and so we need to have source and destination for each edge.

Routers are naturally an important concept ns-3 since the way routers run the packet routing algorithms is actually simulated. SimGrid's analytical models however simply aggregate the routing time with the transfer time.

So why did we incorporate routers in SimGrid? Rebuilding a graph representation only from the route information turns out to be a very difficult task, because of the missing information about how routes intersect. That is why we introduced routers, which are simply used to express these intersection points. It is important to understand that routers are only used to provide topological information.

To express this topological information, a route has to be defined in order to declare which link is connected to a router.

Shortest-path based models

The following table shows all the models that compute routes using shortest-paths algorithms are currently available in SimGrid. More detail on how to choose the best routing model is given in the Section called "Choosing wisely the routing model to use".

Name Description
Floyd Floyd routing data. Pre-calculates all routes once
Dijkstra Dijkstra routing data. Calculates routes only when needed
DijkstraCache Dijkstra routing data. Handles some cache for already calculated routes.

All those shortest-path models are instanciated in the same way and are completely interchangeable. Here are some examples:

Floyd

Floyd example:

<AS  id="AS0"  routing="Floyd">

  <cluster id="my_cluster_1" prefix="c-" suffix=""
           radical="0-1" speed="1000000000" bw="125000000" lat="5E-5"
           router_id="router1"/>

  <AS id="AS1" routing="None">
    <host id="host1" speed="1000000000"/>
  </AS>

  <link id="link1" bandwidth="100000" latency="0.01"/>

  <ASroute src="my_cluster_1" dst="AS1"
    gw_src="router1"
    gw_dst="host1">
    <link_ctn id="link1"/>
  </ASroute>

</AS>

ASroute given at the end gives a topological information: link1 is between router1 and host1.

Example platform files

This is an automatically generated list of example files that use the Floyd routing model (the path is given relative to SimGrid's source directory)

examples/platforms/cloud.xml
examples/platforms/data_center.xml
examples/platforms/g5k.xml
examples/platforms/bypassRoute.xml
examples/platforms/cloud_vivaldi.xml

Dijkstra

Example platform files

This is an automatically generated list of example files that use the Dijkstra routing model (the path is given relative to SimGrid's source directory)

examples/platforms/bypassRoute.xml

Dijkstra example :

 <AS id="AS_2" routing="Dijkstra">
     <host id="AS_2_host1" speed="1000000000"/>
     <host id="AS_2_host2" speed="1000000000"/>
     <host id="AS_2_host3" speed="1000000000"/>
     <link id="AS_2_link1" bandwidth="1250000000" latency="5E-4"/>
     <link id="AS_2_link2" bandwidth="1250000000" latency="5E-4"/>
     <link id="AS_2_link3" bandwidth="1250000000" latency="5E-4"/>
     <link id="AS_2_link4" bandwidth="1250000000" latency="5E-4"/>
     <router id="central_router"/>
     <router id="AS_2_gateway"/>
     <!-- routes providing topological information -->
     <route src="central_router" dst="AS_2_host1"><link_ctn id="AS_2_link1"/></route>
     <route src="central_router" dst="AS_2_host2"><link_ctn id="AS_2_link2"/></route>
     <route src="central_router" dst="AS_2_host3"><link_ctn id="AS_2_link3"/></route>
     <route src="central_router" dst="AS_2_gateway"><link_ctn id="AS_2_link4"/></route>
  </AS>

DijkstraCache

DijkstraCache example:

<AS id="AS_2" routing="DijkstraCache">
     <host id="AS_2_host1" speed="1000000000"/>
     ...
(platform unchanged compared to upper example)

Example platform files

This is an automatically generated list of example files that use the DijkstraCache routing model (the path is given relative to SimGrid's source directory):

Editor's note: At the time of writing, no platform file used this routing model - so if there are no example files listed here, this is likely to be correct.

Manually-entered route models

Name Description
Full You have to enter all necessary routers manually; that is, every single route. This may consume a lot of memory when the XML is parsed and might be tedious to write; i.e., this is only recommended (if at all) for small platforms.

Full

Full example :

<AS  id="AS0"  routing="Full">
   <host id="host1" speed="1000000000"/>
   <host id="host2" speed="1000000000"/>
   <link id="link1" bandwidth="125000000" latency="0.000100"/>
   <route src="host1" dst="host2"><link_ctn id="link1"/></route>
 </AS>

Example platform files

This is an automatically generated list of example files that use the Full routing model (the path is given relative to SimGrid's source directory):

examples/msg/mc/platform.xml
examples/msg/energy-onoff/platform_onoff.xml
examples/platforms/faulty_host.xml
examples/platforms/cluster_fat_tree.xml
examples/platforms/simulacrum_7_hosts.xml
examples/platforms/bypassASroute.xml
examples/platforms/storage/remote_io.xml
examples/platforms/storage/storage.xml
examples/platforms/energy_platform.xml
examples/platforms/small_platform.xml
examples/platforms/three_multicore_hosts.xml
examples/platforms/blue.xml
examples/platforms/config_tracing.xml
examples/platforms/routing_cluster.xml
examples/platforms/small_platform_fatpipe.xml
examples/platforms/prop.xml
examples/platforms/onelink.xml
examples/platforms/small_platform_with_failures.xml
examples/platforms/dogbone.xml
examples/platforms/cluster_backbone.xml
examples/platforms/include.xml
examples/platforms/small_platform_one_link_routes.xml
examples/platforms/multicore_machine.xml
examples/platforms/meta_cluster.xml
examples/platforms/cloud.xml
examples/platforms/small_platform_with_routers.xml
examples/platforms/data_center.xml
examples/platforms/cluster_torus.xml
examples/platforms/two_hosts_platform_shared.xml
examples/platforms/g5k.xml
examples/platforms/two_hosts_platform_with_availability_included.xml
examples/platforms/config.xml
examples/platforms/bypassRoute.xml
examples/platforms/cluster_and_one_host.xml
examples/platforms/crosstraffic.xml
examples/platforms/cloud_vivaldi.xml
examples/platforms/cluster_dragonfly.xml
examples/platforms/griffon.xml
examples/platforms/two_hosts.xml
examples/platforms/cluster_no_backbone.xml
examples/platforms/two_hosts_platform_with_availability.xml

Simple/fast models

Name Description
Cluster This is specific to the <cluster/> tag and should not be used by the user, as several assumptions are made.
None No routing at all. Unless you know what you're doing, avoid using this mode in combination with a non-constant network model.
Vivaldi Perfect when you want to use coordinates. Also see the corresponding P2P section below.

Cluster

Note
In this mode, the <cabinet/> tag is available.

Example platform files

This is an automatically generated list of example files that use the Cluster routing model (the path is given relative to SimGrid's source directory):

examples/platforms/routing_cluster.xml
examples/platforms/meta_cluster.xml

None

This model does exactly what it's name advertises: Nothing. There is no routing available within this model and if you try to communicate within the AS that uses this model, SimGrid will fail unless you have explicitly activated the Constant Network Model (this model charges the same for every single communication). It should be noted, however, that you can still attach an ASroute, as is demonstrated in the example below:

<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
<platform version="4.1">
  <zone id="AS0" routing="Full">
    <cluster id="my_cluster_1" prefix="c-" suffix=".me" radical="0-1" speed="1Gf" bw="125MBps" lat="50us"
             router_id="router1"/>

    <zone id="AS1" routing="None">
      <host id="host1" speed="1Gf"/>
    </zone>

    <link id="link1" bandwidth="100kBps" latency="10ms"/>

    <zoneRoute src="my_cluster_1" dst="AS1" gw_src="router1" gw_dst="host1">
      <link_ctn id="link1"/>
    </zoneRoute>
  </zone>
</platform>

Example platform files

This is an automatically generated list of example files that use the None routing model (the path is given relative to SimGrid's source directory):

examples/platforms/prop.xml
examples/platforms/routing_none.xml
examples/platforms/cluster_and_one_host.xml

Vivaldi

For more information on how to use the Vivaldi Coordinates, see also Section P2P tags.

Note that it is possible to combine the Vivaldi routing model with other routing models; an example can be found in the file examples/platforms/cloud.xml. This examples models a NetZone using Vivaldi that contains other NetZones that use different routing models.

Example platform files

This is an automatically generated list of example files that use the None routing model (the path is given relative to SimGrid's source directory):

examples/platforms/vivaldi.xml
examples/platforms/cloud.xml
examples/platforms/data_center.xml
examples/platforms/two_peers.xml
examples/platforms/cloud_vivaldi.xml

Defining routes

There are currently four different ways to define routes:

Name Description
route Used to define route between host/router
zoneRoute Used to define route between different zones
bypassRoute Used to supersede normal routes as calculated by the network model between host/router; e.g., can be used to use a route that is not the shortest path for any of the shortest-path routing models.
bypassZoneRoute Used in the same way as bypassRoute, but for zones

Basically all those tags will contain an (ordered) list of references to link that compose the route you want to define.

Consider the example below:

<route src="Alice" dst="Bob">
        <link_ctn id="link1"/>
        <link_ctn id="link2"/>
        <link_ctn id="link3"/>
</route>

The route here from host Alice to Bob will be first link1, then link2, and finally link3. What about the reverse route? Route and ASroute have an optional attribute symmetrical, that can be either YES or NO. YES means that the reverse route is the same route in the inverse order, and is set to YES by default. Note that this is not the case for bypass*Route, as it is more probable that you want to bypass only one default route.

For an ASroute, things are just slightly more complicated, as you have to give the id of the gateway which is inside the AS you want to access ... So it looks like this:

<ASroute src="AS1" dst="AS2"
  gw_src="router1" gw_dst="router2">
  <link_ctn id="link1"/>
</ASroute>

gw == gateway, so when any message are trying to go from AS1 to AS2, it means that it must pass through router1 to get out of the AS, then pass through link1, and get into AS2 by being received by router2. router1 must belong to AS1 and router2 must belong to AS2.

<link_ctn>

This entity has only one purpose: Refer to an already existing <link/> when defining a route, i.e., it can only occur as a child of <route/>

Attribute name Mandatory Values Description
id yes String The identifier of the link that should be added to the route.
direction maybe UP|DOWN If the link referenced by id has been declared as FULLDUPLEX, this indicates which direction the route traverses through this link: UP or DOWN. If you don't use FULLDUPLEX, do not use this attribute or SimGrid will not find the right link.

Example Files

This is an automatically generated list of example files that use the <link_ctn/> entity (the path is given relative to SimGrid's source directory):

examples/msg/mc/platform.xml
examples/msg/energy-onoff/platform_onoff.xml
examples/platforms/simulacrum_7_hosts.xml
examples/platforms/bypassASroute.xml
examples/platforms/storage/remote_io.xml
examples/platforms/storage/storage.xml
examples/platforms/energy_platform.xml
examples/platforms/small_platform.xml
examples/platforms/three_multicore_hosts.xml
examples/platforms/config_tracing.xml
examples/platforms/routing_cluster.xml
examples/platforms/small_platform_fatpipe.xml
examples/platforms/prop.xml
examples/platforms/onelink.xml
examples/platforms/small_platform_with_failures.xml
examples/platforms/dogbone.xml
examples/platforms/cluster_backbone.xml
examples/platforms/include.xml
examples/platforms/small_platform_one_link_routes.xml
examples/platforms/meta_cluster.xml
examples/platforms/cloud.xml
examples/platforms/small_platform_with_routers.xml
examples/platforms/data_center.xml
examples/platforms/two_hosts_platform_shared.xml
examples/platforms/g5k.xml
examples/platforms/two_hosts_platform_with_availability_included.xml
examples/platforms/config.xml
examples/platforms/bypassRoute.xml
examples/platforms/cluster_and_one_host.xml
examples/platforms/crosstraffic.xml
examples/platforms/cloud_vivaldi.xml
examples/platforms/griffon.xml
examples/platforms/two_hosts.xml
examples/platforms/two_hosts_platform_with_availability.xml

<zoneRoute>

The purpose of this entity is to define a route between two ASes. This is mainly useful when you're in the Full routing model.

Attributes

Attribute name Mandatory Values Description
src yes String The identifier of the source AS
dst yes String See the src attribute
gw_src yes String The gateway that will be used within the src AS; this can be any Host or Router defined within the src AS.
gw_dst yes String Same as gw_src, but with the dst AS instead.
symmetrical no YES|NO (Default: YES) If this route is symmetric, the opposite route (from dst to src) will also be declared implicitly.

Example

<AS  id="AS0"  routing="Full">
  <cluster id="my_cluster_1" prefix="c-" suffix=".me"
                radical="0-149" speed="1000000000" bw="125000000" lat="5E-5"
        bb_bw="2250000000" bb_lat="5E-4"/>

  <cluster id="my_cluster_2" prefix="c-" suffix=".me"
    radical="150-299" speed="1000000000" bw="125000000" lat="5E-5"
    bb_bw="2250000000" bb_lat="5E-4"/>

     <link id="backbone" bandwidth="1250000000" latency="5E-4"/>

     <ASroute src="my_cluster_1" dst="my_cluster_2"
         gw_src="c-my_cluster_1_router.me"
         gw_dst="c-my_cluster_2_router.me">
                <link_ctn id="backbone"/>
     </ASroute>
     <ASroute src="my_cluster_2" dst="my_cluster_1"
         gw_src="c-my_cluster_2_router.me"
         gw_dst="c-my_cluster_1_router.me">
                <link_ctn id="backbone"/>
     </ASroute>
</AS>

<route>

The principle is the same as for ASroute: The route contains a list of links that provide a path from src to dst. Here, src and dst can both be either a host or router. This is mostly useful for the Full routing model as well as for the shortest-paths based models (as they require topological information).

Attribute name Mandatory Values Description
src yes String The value given to the source's "id" attribute
dst yes String The value given to the destination's "id" attribute.
symmetrical no YES| NO (Default: YES) If this route is symmetric, the opposite route (from dst to src) will also be declared implicitly.

Examples

A route in the Full routing model could look like this:

 <route src="Tremblay" dst="Bourassa">
     <link_ctn id="4"/><link_ctn id="3"/><link_ctn id="2"/><link_ctn id="0"/><link_ctn id="1"/><link_ctn id="6"/><link_ctn id="7"/>
 </route>

A route in the Shortest-Path routing model could look like this:

<route src="Tremblay" dst="Bourassa">
  <link_ctn id="3"/>
</route>
Note
You must only have one link in your routes when you're using them to provide topological information, as the routes here are simply the edges of the (network-)graph and the employed algorithms need to know which edge connects which pair of entities.

bypassASroute

As said before, once you choose a model, it (most likely; the constant network model, for example, doesn't) calculates routes for you. But maybe you want to define some of your routes, which will be specific. You may also want to bypass some routes defined in lower level AS at an upper stage: bypassASroute is the tag you're looking for. It allows to bypass routes defined between already defined between AS (if you want to bypass route for a specific host, you should just use byPassRoute). The principle is the same as ASroute : bypassASroute contains list of links that are in the path between src and dst.

Attributes

Attribute name Mandatory Values Description
src yes String The value given to the source AS's "id" attribute
dst yes String The value given to the destination AS's "id" attribute.
gw_src yes String The value given to the source gateway's "id" attribute; this can be any host or router within the src AS
gw_dst yes String The value given to the destination gateway's "id" attribute; this can be any host or router within the dst AS
symmetrical no YES| NO (Default: YES) If this route is symmetric, the opposite route (from dst to src) will also be declared implicitly.

Example

<bypassASRoute src="my_cluster_1" dst="my_cluster_2"
  gw_src="my_cluster_1_router"
  gw_dst="my_cluster_2_router">
    <link_ctn id="link_tmp"/>
</bypassASroute>

This example shows that link link_tmp (definition not displayed here) directly connects the router my_cluster_1_router in the source cluster to the router my_cluster_2_router in the destination router. Additionally, as the symmetrical attribute was not given, this route is presumed to be symmetrical.

bypassRoute

As said before, once you choose a model, it (most likely; the constant network model, for example, doesn't) calculates routes for you. But maybe you want to define some of your routes, which will be specific. You may also want to bypass some routes defined in lower level AS at an upper stage : bypassRoute is the tag you're looking for. It allows to bypass routes defined between host/router. The principle is the same as route : bypassRoute contains list of links references of links that are in the path between src and dst.

Attributes

Attribute name Mandatory Values Description
src yes String The value given to the source AS's "id" attribute
dst yes String The value given to the destination AS's "id" attribute.
symmetrical no YES | NO (Default: YES) If this route is symmetric, the opposite route (from dst to src) will also be declared implicitly.

Examples

<bypassRoute src="host_1" dst="host_2">
   <link_ctn id="link_tmp"/>
</bypassRoute>

This example shows that link link_tmp (definition not displayed here) directly connects host host_1 to host host_2. Additionally, as the symmetrical attribute was not given, this route is presumed to be symmetrical.

Basic Routing Example

Let's say you have an AS named AS_Big that contains two other AS, AS_1 and AS_2. If you want to make a host (h1) from AS_1 with another one (h2) from AS_2 then you'll have to proceed as follows:

  • First, you have to ensure that a route is defined from h1 to the AS_1's exit gateway and from h2 to AS_2's exit gateway.
  • Then, you'll have to define a route between AS_1 to AS_2. As those AS are both resources belonging to AS_Big, then it has to be done at AS_big level. To define such a route, you have to give the source AS (AS_1), the destination AS (AS_2), and their respective gateway (as the route is effectively defined between those two entry/exit points). Elements of this route can only be elements belonging to AS_Big, so links and routers in this route should be defined inside AS_Big. If you choose some shortest-path model, this route will be computed automatically.

As said before, there are mainly 2 tags for routing :

  • ASroute: to define routes between two AS
  • route: to define routes between two host/router

As we are dealing with routes between AS, it means that those we'll have some definition at AS_Big level. Let consider AS_1 contains 1 host, 1 link and one router and AS_2 3 hosts, 4 links and one router. There will be a central router, and a cross-like topology. At the end of the crosses arms, you'll find the 3 hosts and the router that will act as a gateway. We have to define routes inside those two AS. Let say that AS_1 contains full routes, and AS_2 contains some Floyd routing (as we don't want to bother with defining all routes). As we're using some shortest path algorithms to route into AS_2, we'll then have to define some route to gives some topological information to SimGrid. Here is a file doing it all :

<AS  id="AS_Big"  routing="Dijkstra">
  <AS id="AS_1" routing="Full">
     <host id="AS_1_host1" speed="1000000000"/>
     <link id="AS_1_link" bandwidth="1250000000" latency="5E-4"/>
     <router id="AS_1_gateway"/>
     <route src="AS_1_host1" dst="AS_1_gateway">
            <link_ctn id="AS_1_link"/>
     </route>
  </AS>
  <AS id="AS_2" routing="Floyd">
     <host id="AS_2_host1" speed="1000000000"/>
     <host id="AS_2_host2" speed="1000000000"/>
     <host id="AS_2_host3" speed="1000000000"/>
     <link id="AS_2_link1" bandwidth="1250000000" latency="5E-4"/>
     <link id="AS_2_link2" bandwidth="1250000000" latency="5E-4"/>
     <link id="AS_2_link3" bandwidth="1250000000" latency="5E-4"/>
     <link id="AS_2_link4" bandwidth="1250000000" latency="5E-4"/>
     <router id="central_router"/>
     <router id="AS_2_gateway"/>
     <!-- routes providing topological information -->
     <route src="central_router" dst="AS_2_host1"><link_ctn id="AS_2_link1"/></route>
     <route src="central_router" dst="AS_2_host2"><link_ctn id="AS_2_link2"/></route>
     <route src="central_router" dst="AS_2_host3"><link_ctn id="AS_2_link3"/></route>
     <route src="central_router" dst="AS_2_gateway"><link_ctn id="AS_2_link4"/></route>
  </AS>
    <link id="backbone" bandwidth="1250000000" latency="5E-4"/>

     <ASroute src="AS_1" dst="AS_2"
         gw_src="AS_1_gateway"
         gw_dst="AS_2_gateway">
                <link_ctn id="backbone"/>
     </ASroute>
</AS>

Other tags

The following tags can be used inside a <platform> tag even if they are not directly describing the platform:

  • <config> passes configuration options, e.g. to change the network model;
  • <prop> gives user-defined properties to various elements

<config>

Adding configuration flags into the platform file is particularly useful when the described platform is best used with specific flags. For example, you could finely tune SMPI in your platform file directly.

Attribute Values Description
id String (optional) This optional identifier is ignored by SimGrid

Included tags: <prop> to specify a given configuration item (see Configure SimGrid).

Any such configuration must be given at the very top of the platform file.

Example

<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
<platform version="4">
<config>
        <prop id="maxmin/precision" value="0.000010" />
        <prop id="cpu/optim" value="TI" />
        <prop id="network/model" value="SMPI" />
        <prop id="smpi/bw-factor" value="65472:0.940694;15424:0.697866;9376:0.58729" />
</config>

<AS  id="AS0"  routing="Full">
...

<prop>

Defines a user-defined property, identified with a name and having a value. You can specify such properties to most kind of resources: <zone>, <host>, <storage>, <cluster> and <link>. These values can be retrieved at runtime with MSG_zone_property() or simgrid::s4u::NetZone::property(), or similar functions.

Attribute Values Description
id String (mandatory) Identifier of this property. Must be unique for a given property holder, eg host or link.
value String (mandatory) Value of this property; The semantic is completely up to you.

Included tags: none.

Example

<prop id="Operating System" value="Linux" />

trace and trace_connect

Both tags are an alternate way to pass files containing information on availability, state etc. to an entity. (See also, for instance, Section Churn, as described for the host entity.) Instead of referring to the file directly in the host, link, or cluster tag, you proceed by defining a trace with an id corresponding to a file, later a host/link/cluster, and finally using trace_connect you say that the file trace must be used by the entity.

Example

<AS  id="AS0"  routing="Full">
  <host id="bob" speed="1000000000"/>
</AS>
<trace id="myTrace" file="bob.trace" periodicity="1.0"/>
<trace_connect trace="myTrace" element="bob" kind="POWER"/>
Note
The order here is important. trace_connect must come after the elements trace and host, as both the host and the trace definition must be known when trace_connect is parsed; the order of trace and host is arbitrary.

trace attributes

Attribute name Mandatory Values Description
id yes String Identifier of this trace; this is the name you pass on to trace_connect.
file no String Filename of the file that contains the information - the path must follow the style of your OS. You can omit this, but then you must specifiy the values inside of <trace> and </trace> - see the example below.
trace_periodicity yes String This is the same as for hosts (see there for details)

Here is an example of trace when no file name is provided:

 <trace id="myTrace" periodicity="1.0">
    0.0 1.0
    11.0 0.5
    20.0 0.8
 </trace>

trace_connect attributes

Attribute name Mandatory Values Description
kind no HOST_AVAIL|POWER|
LINK_AVAIL|BANDWIDTH|LATENCY (Default: HOST_AVAIL)
Describes the kind of trace.
trace yes String Identifier of the referenced trace (specified of the trace's id attribute)
element yes String The identifier of the referenced entity as given by its id attribute

Hints, tips and frequently requested features

Now you should know at least the syntax and be able to create a platform by your own. However, after having ourselves wrote some platforms, there are some best practices you should pay attention to in order to produce good platform and some choices you can make in order to have faster simulations. Here's some hints and tips, then.

Finding the platform example that you need

Most platform files that we ship are in the examples/platforms folder. The good old grep tool can find the examples you need when wondering on a specific XML tag. Here is an example session searching for trace_connect:

% cd examples/platforms
% grep -R -i -n --include="*.xml" "trace_connect" .
./two_hosts_platform_with_availability_included.xml:26:<trace_connect kind="SPEED" trace="A" element="Cpu A"/>
./two_hosts_platform_with_availability_included.xml:27:<trace_connect kind="HOST_AVAIL" trace="A_failure" element="Cpu A"/>
./two_hosts_platform_with_availability_included.xml:28:<trace_connect kind="SPEED" trace="B" element="Cpu B"/>
./two_hosts.xml:17:  <trace_connect trace="Tremblay_power" element="Tremblay" kind="SPEED"/>

AS Hierarchy

The AS design allows SimGrid to go fast, because computing route is done only for the set of resources defined in this AS. If you're using only a big AS containing all resource with no AS into it and you're using Full model, then ... you'll loose all interest into it. On the other hand, designing a binary tree of AS with, at the lower level, only one host, then you'll also loose all the good AS hierarchy can give you. Remind you should always be "reasonable" in your platform definition when choosing the hierarchy. A good choice if you try to describe a real life platform is to follow the AS described in reality, since this kind of trade-off works well for real life platforms.

Exit AS: why and how

Users that have looked at some of our platforms may have notice a non-intuitive schema ... Something like that :

<AS id="AS_4"  routing="Full">
<AS id="exitAS_4"  routing="Full">
        <router id="router_4"/>
</AS>
<cluster id="cl_4_1" prefix="c_4_1-" suffix="" radical="1-20" speed="1000000000" bw="125000000" lat="5E-5" bb_bw="2250000000" bb_lat="5E-4"/>
<cluster id="cl_4_2" prefix="c_4_2-" suffix="" radical="1-20" speed="1000000000" bw="125000000" lat="5E-5" bb_bw="2250000000" bb_lat="5E-4"/>
<link id="4_1" bandwidth="2250000000" latency="5E-5"/>
<link id="4_2" bandwidth="2250000000" latency="5E-5"/>
<link id="bb_4" bandwidth="2250000000" latency="5E-4"/>
<ASroute src="cl_4_1"
        dst="cl_4_2"
        gw_src="c_4_1-cl_4_1_router"
        gw_dst="c_4_2-cl_4_2_router">
                <link_ctn id="4_1"/>
                <link_ctn id="bb_4"/>
                <link_ctn id="4_2"/>
</ASroute>
<ASroute src="cl_4_1"
        dst="exitAS_4"
        gw_src="c_4_1-cl_4_1_router"
        gw_dst="router_4">
                <link_ctn id="4_1"/>
                <link_ctn id="bb_4"/>
</ASroute>
<ASroute src="cl_4_2"
        dst="exitAS_4"
        gw_src="c_4_2-cl_4_2_router"
        gw_dst="router_4">
                <link_ctn id="4_2"/>
                <link_ctn id="bb_4"/>
</ASroute>
</AS>

In the AS_4, you have an exitAS_4 defined, containing only one router, and routes defined to that AS from all other AS (as cluster is only a shortcut for an AS, see cluster description for details). If there was an upper AS, it would define routes to and from AS_4 with the gateway router_4. It's just because, as we did not allowed (for performances issues) to have routes from an AS to a single host/router, you have to enclose your gateway, when you have AS included in your AS, within an AS to define routes to it.

P2P or how to use coordinates

SimGrid allows you to use some coordinated-based system, like vivaldi, to describe a platform. The main concept is that you have some peers that are located somewhere: this is the function of the coordinates of the <peer> or <host> tag. There's nothing complicated in using it, here is an example:

<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
<platform version="4">

 <AS  id="AS0"  routing="Vivaldi">
        <host id="100030591" coordinates="25.5 9.4 1.4" speed="1.5Gf" />
        <host id="100036570" coordinates="-12.7 -9.9 2.1" speed="7.3Gf" />
        ...
        <host id="100429957" coordinates="17.5 6.7 18.8" speed="8.3Gf" />
        </AS>
</platform>

Coordinates are then used to calculate latency (in microseconds) between two hosts by calculating the distance between the two hosts coordinates with the following formula: distance( (x1, y1, z1), (x2, y2, z2) ) = euclidian( (x1,y1), (x2,y2) ) + abs(z1) + abs(z2)

In other words, we take the euclidian distance on the two first dimensions, and then add the absolute values found on the third dimension. This may seem strange, but it was found to allow better approximations of the latency matrices (see the paper describing Vivaldi).

Note that the previous example defines a routing directly between hosts but it could be also used to define a routing between AS. That is for example what is commonly done when using peers (see Section <peer> (Vivaldi netzones only)).

<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
<platform version="4">

 <AS  id="AS0"  routing="Vivaldi">
   <peer id="peer-0" coordinates="173.0 96.8 0.1" speed="730Mf" bw_in="13.38MBps" bw_out="1.024MBps" lat="500us"/>
   <peer id="peer-1" coordinates="247.0 57.3 0.6" speed="730Mf" bw_in="13.38MBps" bw_out="1.024MBps" lat="500us" />
   <peer id="peer-2" coordinates="243.4 58.8 1.4" speed="730Mf" bw_in="13.38MBps" bw_out="1.024MBps" lat="500us" />
</AS>
</platform>

In such a case though, we connect the AS created by the peer tag with the Vivaldi routing mechanism. This means that to route between AS1 and AS2, it will use the coordinates of router_AS1 and router_AS2. This is currently a convention and we may offer to change this convention in the DTD later if needed. You may have noted that conveniently, a peer named FOO defines an AS named FOO and a router named router_FOO, which is why it works seamlessly with the peer tag.

Choosing wisely the routing model to use

Choosing wisely the routing model to use can significantly fasten your simulation/save your time when writing the platform/save tremendous disk space. Here is the list of available model and their characteristics (lookup : time to resolve a route):

  • Full: Full routing data (fast, large memory requirements, fully expressive)
  • Floyd: Floyd routing data (slow initialization, fast lookup, lesser memory requirements, shortest path routing only). Calculates all routes at once at the beginning.
  • Dijkstra: Dijkstra routing data (fast initialization, slow lookup, small memory requirements, shortest path routing only). Calculates a route when necessary.
  • DijkstraCache: Dijkstra routing data (fast initialization, fast lookup, small memory requirements, shortest path routing only). Same as Dijkstra, except it handles a cache for latest used routes.
  • None: No routing (usable with Constant network only). Defines that there is no routes, so if you try to determine a route without constant network within this AS, SimGrid will raise an exception.
  • Vivaldi: Vivaldi routing, so when you want to use coordinates
  • Cluster: Cluster routing, specific to cluster tag, should not be used.

I want to describe a switch but there is no switch tag!

Actually we did not include switch tag. But when you're trying to simulate a switch, assuming fluid bandwidth models are used (which SimGrid uses by default unless ns-3 or constant network models are activated), the limiting factor is switch backplane bandwidth. So, essentially, at least from the simulation perspective, a switch is similar to a link: some device that is traversed by flows and with some latency and so,e maximum bandwidth. Thus, you can simply simulate a switch as a link. Many links can be connected to this "switch", which is then included in routes just as a normal link.

I want to describe multi-cabinets clusters!

You have several possibilities, as usual when modeling things. If your cabinets are homogeneous and the intercabinet network negligible for your study, you should just create a larger cluster with all hosts at the same layer.

In the rare case where your hosts are not homogeneous between the cabinets, you can create your cluster completely manually. For that, create an As using the Cluster routing, and then use one <cabinet> for each cabinet. This cabinet tag can only be used an As using the Cluster routing schema, and creating

Be warned that creating a cluster manually from the XML with <cabinet>, <backbone> and friends is rather tedious. The easiest way to retrieve some control of your model without diving into the <cluster> internals is certainly to create one separate <cluster> per cabinet and interconnect them together. This is what we did in the G5K example platform for the Graphen cluster.

I want to express multipath routing in platform files!

It is unfortunately impossible to express the fact that there is more than one routing path between two given hosts. Let's consider the following platform file:

<route src="A" dst="B">
   <link_ctn id="1"/>
</route>
<route src="B" dst="C">
  <link_ctn id="2"/>
</route>
<route src="A" dst="C">
  <link_ctn id="3"/>
</route>

Although it is perfectly valid, it does not mean that data traveling from A to C can either go directly (using link 3) or through B (using links 1 and 2). It simply means that the routing on the graph is not trivial, and that data do not following the shortest path in number of hops on this graph. Another way to say it is that there is no implicit in these routing descriptions. The system will only use the routes you declare (such as <route src="A" dst="C"><link_ctn id="3"/></route>), without trying to build new routes by aggregating the provided ones.

You are also free to declare platform where the routing is not symmetrical. For example, add the following to the previous file:

<route src="C" dst="A">
  <link_ctn id="2"/>
  <link_ctn id="1"/>
</route>

This makes sure that data from C to A go through B where data from A to C go directly. Don't worry about realism of such settings since we've seen ways more weird situation in real settings (in fact, that's the realism of very regular platforms which is questionable, but that's another story).