Inter My-Internet Protocol

INTRODUCTION:

IMIP implements a simple object request and caching system, to support
the distributed, high-utilisation My-Internet version 6.  It connects
MI-server machines together through a hub, using a fixed network of
TCP connections.  In this way, IMIP avoids the overhead of repeatedly
establishing TCP connections.

In My-Internet version 4, a single machine services all customers and
all deployments.  MI-6 is a distributed system, with one or more
MI-servers on a deployment-LAN, sharing a single database.  Although
most MI requests can be handled within the deployment, communication
with other deployments is also necessary.

ARCHITECTURE:

All inter-deployment communication passes through a central hub, a LAN
with one or more hub-machines.  The purpose of the hub is to forward
inter-deployment messages, to cache objects, and to distribute load
evenly.  Each MI-server is connected to only one hub-machine, however
a hub-machine may service many MI-servers.  A single message may pass
through more than one hub-machine in order to reach its destination.
Different MI-servers in a single deployment may be connected to
different hub machines; this minimises the number of hops required to
forward a message.  IMIP also supports an internet-distributed hub,
should that prove to be necessary in the future.

Five types of daemon are involved in IMIP: mid, mi_cache,
mi_clean_cache, mi_mux and mi_hub.  The first three of these, mid,
mi_cache, and mi_clean_cache, are derived from MI-4.  The latter pair,
mi_mux and mi_hub, have been developed specifically for the
distributed MI-6.

Each MI-server runs a mi_mux multiplexor daemon, to manage its single
TCP connection to the hub.  All of the mid processes on the server are
connected, over unix-domain sockets, to the mux.  The mux daemon has
three responsibilities: to forward requests from each of its local
mids to the hub, and to return responses from the hub to the correct
mids ; to receive requests from from foreign mids via the hub, and
forward those requests to an appropriate free mid, then return the
mid's reply to the originator ; to monitor the activity of each local
mid, and inform the hub so that load can be distributed evenly amongst
a deployment's MI-servers.

Each hub-machine runs a single mi_hub daemon process.  The hub daemon
has three responsibilities: to forward requests from connected
MI-servers and from other hub-machines, either directly to their
destination, or indirectly via another hub-machine, and to return the
responses by the same path ; to cache objects where appropriate in the
local mi_cache ; to monitor the activity of each MI-server and inform
other hub-machines so that load can be distributed evenly.

Mi_mux and mi_hub are both implemented in Perl using a general-purpose
network-node program that provides buffered, non-blocking IO,
connection and session management, and timeouts.

PROTOCOL:

Each node in the network has an unique identifier.

A message consists of a textual header, terminated by a blank line.
The first line of the header consists of a message type, followed by
two session identifier/s, 'your-ref' and 'my-ref', one or both of
which might be '.' for null (see below).  The other lines of the
header each consist of a key and value separated by ':'.  Padding
whitespace is ignored.  Some messages may attach a binary body
following the header, in which case the 'body-length' key-value pair
must indicate the length of this attachment.  This is important, as it
must be possible for an unknown type of message to be extracted from
the stream and discarded.  Lines in the text header may be terminated
by LF or by CR LF.

The 'my-ref' field is included in any message which may elicit a reply
(even if only an acknowlegement or error).  The 'your-ref' field is
only specified in a reply to a previous message, not in a message that
initiates a new conversation.  If a message-id is not needed for a
particular message, a period '.' takes its place.

A message-id field must be unique for the port on which the message is
sent.  The easiest way to achieve this is simply to use a sequence
number generated by the sender process.  This identifier is quoted in
a response to the message.  Note that the 'your-ref' and 'my-ref' are
different for each hop; each intermediate node allocates a 'my-ref' of
its own for messages that it forwards or sends, and remembers this
together with the 'your-ref' it received, and the connections
involved, so it can route the reply correctly.  Note that a message-id
is not universally unique, only unique for that particular source (or
connection, from the point of view of the intermediate node).

Connection Management:

Starting a mi_hub:

When a mi_hub is started, it is given a TCP port to listen for
connections, and the TCP/IP port/address of any other mi_hubs machines
which are already running, to which it should establish a connection.
It connects to each of these mi_hubs it turn, using TCP, and sends a
'hello_hub' message, which contains the current system time (seconds
since the epoch, UTC).  If the clocks are synchronized, a 'welcome'
message is returned.  The peer mi_hub will return an 'error' message
unless the system times on the two machines agree within a very small
margin (perhaps 5 seconds).  This synchronization is important for the
transfer of cache expiry times.  The connection will not be closed
automatically, and another welcome message may be sent after calling
'rdate' or some such program.


Here are the three types of message:

hello_hub . 0
time:		954119823
.

error 0 .
code:		1
message:	clocks are not synchronized
.

welcome 0 .
.

Starting a mi_mux:

When a mi_mux is started, it is given the name of its deployment, and
the TCP/IP port/address for its mi_hub (which should already be
running).  It connects to this hub using TCP, and sends a 'hello_mux'
message, which contains the deployment name, a list of the customers
in the deployment, and the current system time.  If it cannot connect,
it consults a list of other hubs, which it was given in a previous
session, and connects to one of these.  The reply may be an 'error' or
'welcome'.  The 'welcome' is shortly followed by a 'hubs' message (see
below).

Here are is the 'hello_mux' message:

hello_mux . 0
deployment:	vic
customers:	moeps aberfeld charlton adm5 alexsc alfredps altonaps ...
time:		954119830
.


The mi_hub will then send a 'add_server' message to all of the other mi_hubs:

add_server . .
deployment:	vic
server:		adm1
customers:	moeps aberfeld charlton adm5 alexsc alfredps altonaps ...

Starting a mid:

When a mid is started, it is given the name of the socket through
which it should connect to its local mi_mux (which should already be
running).  The procedure similar to the above, the mid sends a
'hello_mid' message, and the mi_mux responds with 'welcome' or 'error':

hello_mid . 0
time:		954119835

Stopping a daemon:

The system is designed to avoid the need for termination messages,
because this makes it much more stable in the event of the accidental
termination of a machine, or of network failure.

When a mid stops, all it needs to do is disconnect, it need not send
any special termination messages.

When a mi_mux stops, any mid processes remaining on that machine will
no longer be able to communicate with the network (typically, the
parent mid should detect the problem and restart the mi_mux daemon).
The associated mi_hub must detect the situation and send a
'remove_server' message to all other mi_hubs.  The mi_mux need not
send any special termination messages.

remove_server . .
deployment:	vic
server:		adm1

When a mi_hub stops, the mi_mux daemons on its MI_servers must
connect to another hub.  The other hubs must remove entries related to
this hub in their routing tables.  The mi_hub need not send any
special termination messages.


Network Topology:

Sometimes it is necessary for a mi_mux on a MI_server to use a mi_hub
machine other than its default, if one machine is overloaded or
unreachable.  After a mi_hub sends 'welcome' to a mi_mux, it then
sends a 'topology' message, containing a list of a hubs that are
currently active, in an appropriate order of preference for that
mi_mux.  This message is also sent to every mux whenever a hub starts
or stops.  The mi_mux saves the list in a configuration file.  If the
mi_mux is ever unable to connect to its default mi_hub server, or it
loses its connection, it can use the list of all known mi_hub servers,
which was saved previously, and attempt to connect to each in turn.

In this case, or if a hub becomes overloaded, the hub may send a
'reconnect' message, asking the mux to disconnect and reconnect to
another (specified) mi_hub.  The mux does not disconnect immediately;
it buffers requests from its mids, and waits until all active requests
have been resolved before continuing.  It will not receive any further
requests from the hub, but will receive any outstanding replies.  It
may be that the mux can connect to both hubs at once, and thus smooth
the transition.

topology . .
hubs:		192.168.1.92:2345 192.168.1.93:2345 192.168.1.94:2345
.

reconnect . .
hub:		129.168.1.92:2345
.


Request and Response:

When a mid requires a remote object, it sends a request to its mi_mux.
It may send an optional timeout, after which the attempt should be
aborted.  There are three types of request - 'object',
'user' and 'ACL':

An request of type 'object' has an 'object_DCMI' field, which contains the
deployment, customer, module and instance required.  The
'requester_DCGL' field specifies the deployment, customer, group and
login of the user for whom the request is being made.

A request of type 'user' has an 'user_DCL' field, the deployment, customer and
login of the user record.

A request of type 'ACL' has an 'ACL_DC' field, the deployment and customer of
the ACL.

The 'timeout' field is optional, and defaults to a generous value,
perhaps 30 seconds; this default is the maximum timeout.

Here are the three types of request:

request . 1
type: 		object
identifier:	vic moeps MiniForum 1004
requester:	snc mi teaching margo
timeout:	10

request . 2
type: 		user
identifier:	vic moeps bert
timeout:	10

request . 3
type: 		ACL
identifier:	vic charlton (?)
timeout:	10

A request is forwarded without change, excepting the 'my_ref' field,
to a foreign mid (or mi_hub with a matching cache entry) which can
provide the necessary information.  This daemon then sends a 'response' message,
which may include a 'cache_until' field if it is cacheable.

The form of the responses to these requests is:

response 1 .
cache_until:	954119955
body_length:	411
.
[binary data here, 411 bytes, then the next message starts]

The reply is forwarded without change, excepting the 'your_ref' field,
to the original mid that made the request.  If the object is
cacheable, and it passes through one or more intermediate mi_hub
machines, then it will be cached on those machines.  Since the
response does not contain the information needed to key the cache,
this information is kept associated with the session in each
intermediate mi_hub.

If a foreign mi_mux is unable to service a request, because all of its
mids are busy, it does not wait, but returns a 'busy' message
immediately.  The hub can then try other MI-servers, or wait and try
again later.

busy 2 .
.

It may be that the request cannot be serviced, in which case an
'error' message will be returned from the mi_hub.


Cache Management:

Under some circumstances it may be necessary to clean some or all
records from the cache.  If a new message is posted to a noticeboard,
all cached images of the noticeboard must be flushed so that remote
users can immediately see the updated version.  The 'cache_clean'
message cleans a range of keys, as specified by the 'key' field, from
every cache in the system, including caches on hubs and MI-servers.
It may be that information is recorded to prevent 'cache_clean'
messages being sent to nodes that have not cached the object in
question, however this might not necessarily improve performance.

cache_clean . 12
key:		MiniForum 1036 * * * *
.