forked from ofiwg/ibacm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
acm_notes.txt
142 lines (118 loc) · 6.86 KB
/
acm_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
Assistant for InfiniBand Communication Management (IB ACM)
Note: The IB ACM should be considered experimental.
Overview
--------
The IB ACM package implements and provides a framework for experimental name,
address, and route resolution services over InfiniBand. It is intended to
address connection setup scalability issues running MPI applications on
large clusters. The IB ACM provides information needed to establish a
connection, but does not implement the CM protocol.
The librdmacm can invoke IB ACM services when built using the --with-ib_acm
option. The IB ACM services tie in under the rdma_resolve_addr,
rdma_resolve_route, and rdma_getaddrinfo routines. For maximum benefit,
the rdma_getaddrinfo routine should be used, however existing applications
should still see significant connection scaling benefits using the calls
available in librdmacm 1.0.11 and previous releases.
The IB ACM is focused on being scalable and efficient. The current
implementation limits network traffic, SA interactions, and centralized
services. ACM supports multiple resolution protocols in order to handle
different fabric topologies.
This release is limited in its handling of dynamic changes.
The IB ACM package is comprised of two components: the ib_acm service
and a test/configuration utility - ib_acme. Both are userspace components
and are available for Linux and Windows. Additional details are given below.
Quick Start Guide
-----------------
1. Prerequisites: libibverbs and libibumad must be installed.
The IB stack should be running with IPoIB configured.
These steps assume that the user has administrative privileges.
2. Install the IB ACM package
This installs ib_acm, and ib_acme.
3. Run ib_acme -A -O
This will generate IB ACM address and options configuration files.
(acm_addr.cfg and acm_opts.cfg)
4. Run ib_acm and leave running.
ib_acm will eventually be converted to a service/daemon, but for now
is a userspace application. Because ib_acm uses the libibumad
interfaces, it should be run with administrative privileges.
5. Optionally, run ib_acme -s <source_ip> -d <dest_ip> -v
This will verify that the ib_acm service is running.
5. Install librdmacm.
The librdmacm will automatically use the ib_acm service.
On failures, the librdmacm will fall back to normal resolution.
Details
-------
ib_acme:
The ib_acme program serves a dual role. It acts as a utility to test
ib_acm operation and help verify if the ib_acm service and selected
protocol is usable for a given cluster configuration. Additionally,
it automatically generates ib_acm configuration files to assist with
or eliminate manual setup.
acm configuration files:
The ib_acm service relies on two configuration files.
The acm_addr.cfg file contains name and address mappings for each IB
<device, port, pkey> endpoint. Although the names in the acm_addr.cfg
file can be anything, ib_acme maps the host name and IP addresses to
the IB endpoints.
The acm_opts.cfg file provides a set of configurable options for the
ib_acm service, such as timeout, number of retries, logging level, etc.
ib_acme generates the acm_opts.cfg file using static information. A
future enhancement would adjust options based on the current system
and cluster size.
ib_acm:
The ib_acm service is responsible for resolving names and addresses to
InfiniBand path information and caching such data. It is currently
implemented as an executable application, but is a conceptual service
or daemon that should execute with administrative privileges.
The ib_acm implements a client interface over TCP sockets, which is
abstracted by the librdmacm library. One or more back-end protocols are
used by the ib_acm service to satisfy user requests. Although the
ib_acm supports standard SA path record queries on the back-end, it
provides an experimental multicast resolution protocol in hope of
achieving greater scalability. The latter is not usable on all fabric
topologies, specifically ones that may not have reversible paths.
Users should use the ib_acme utility to verify that multicast protocol
is usable before running other applications.
Conceptually, the ib_acm service implements an ARP like protocol and either
uses IB multicast records to construct path record data or queries the
SA directly, depending on the selected route protocol. By default, the
ib_acm services uses and caches SA path record queries.
Specifically, all IB endpoints join a number of multicast groups.
Multicast groups differ based on rates, mtu, sl, etc., and are prioritized.
All participating endpoints must be able to communicate on the lowest
priority multicast group. The ib_acm assigns one or more names/addresses
to each IB endpoint using the acm_addr.cfg file. Clients provide source
and destination names or addresses as input to the service, and receive
as output path record data.
The service maps a client's source name/address to a local IB endpoint.
If a client does not provide a source address, then the ib_acm service
will select one based on the destination and local routing tables. If the
destination name/address is not cached locally, it sends a multicast
request out on the lowest priority multicast group on the local endpoint.
The request carries a list of multicast groups that the sender can use.
The recipient of the request selects the highest priority multicast group
that it can use as well and returns that information directly to the sender.
The request data is cached by all endpoints that receive the multicast
request message. The source endpoint also caches the response and uses
the multicast group that was selected to construct or obtain path record
data, which is returned to the client.
The current implementation of the IB ACM has several additional restrictions:
- The ibacm is limited in its handling of dynamic changes;
the ibacm should be stopped and restarted if a cluster is reconfigured.
- Support for IPv6 has not been verified.
- The number of addresses that can be assigned to a single endpoint is
limited to 4.
- The number of multicast groups that an endpoint can support is limited to 2.
The ibacm contains several internal caches. These include caches for
GID and LID destination addresses. These caches can be optionally
preloaded. ibacm supports the OpenSM dump_pr plugin "full" PathRecord
format which is used to preload these caches. The file format is specified
in the ibacm_opts.cfg file via the route_preload setting which should
be set to opensm_full_v1 for this file format. Default format is
none which does not preload these caches. See dump_pr.notes.txt in dump_pr
for more information on the opensm_full_v1 file format and how to configure
OpenSM to generate this file.
Additionally, the name, IPv4, and IPv6 caches can be be preloaded by using
the addr_preload option. The default is none which does not preload these
caches. To preload these caches, set this option to acm_hosts and
configure the addr_data_file appropriately.