Inter My-Internet Protocol INTRODUCTION: IMIP implements a simple object request and caching system, to support the distributed, high-utilisation My-Internet version 6. It connects MI-server machines together through a hub, using a fixed network of TCP connections. In this way, IMIP avoids the overhead of repeatedly establishing TCP connections. In My-Internet version 4, a single machine services all customers and all deployments. MI-6 is a distributed system, with one or more MI-servers on a deployment-LAN, sharing a single database. Although most MI requests can be handled within the deployment, communication with other deployments is also necessary. ARCHITECTURE: All inter-deployment communication passes through a central hub, a LAN with one or more hub-machines. The purpose of the hub is to forward inter-deployment messages, to cache objects, and to distribute load evenly. Each MI-server is connected to only one hub-machine, however a hub-machine may service many MI-servers. A single message may pass through more than one hub-machine in order to reach its destination. Different MI-servers in a single deployment may be connected to different hub machines; this minimises the number of hops required to forward a message. IMIP also supports an internet-distributed hub, should that prove to be necessary in the future. Five types of daemon are involved in IMIP: mid, mi_cache, mi_clean_cache, mi_mux and mi_hub. The first three of these, mid, mi_cache, and mi_clean_cache, are derived from MI-4. The latter pair, mi_mux and mi_hub, have been developed specifically for the distributed MI-6. Each MI-server runs a mi_mux multiplexor daemon, to manage its single TCP connection to the hub. All of the mid processes on the server are connected, over unix-domain sockets, to the mux. The mux daemon has three responsibilities: to forward requests from each of its local mids to the hub, and to return responses from the hub to the correct mids ; to receive requests from from foreign mids via the hub, and forward those requests to an appropriate free mid, then return the mid's reply to the originator ; to monitor the activity of each local mid, and inform the hub so that load can be distributed evenly amongst a deployment's MI-servers. Each hub-machine runs a single mi_hub daemon process. The hub daemon has three responsibilities: to forward requests from connected MI-servers and from other hub-machines, either directly to their destination, or indirectly via another hub-machine, and to return the responses by the same path ; to cache objects where appropriate in the local mi_cache ; to monitor the activity of each MI-server and inform other hub-machines so that load can be distributed evenly. Mi_mux and mi_hub are both implemented in Perl using a general-purpose network-node program that provides buffered, non-blocking IO, connection and session management, and timeouts. PROTOCOL: Each node in the network has an unique identifier. A message consists of a textual header, terminated by a blank line. The first line of the header consists of a message type, followed by two session identifier/s, 'your-ref' and 'my-ref', one or both of which might be '.' for null (see below). The other lines of the header each consist of a key and value separated by ':'. Padding whitespace is ignored. Some messages may attach a binary body following the header, in which case the 'body-length' key-value pair must indicate the length of this attachment. This is important, as it must be possible for an unknown type of message to be extracted from the stream and discarded. Lines in the text header may be terminated by LF or by CR LF. The 'my-ref' field is included in any message which may elicit a reply (even if only an acknowlegement or error). The 'your-ref' field is only specified in a reply to a previous message, not in a message that initiates a new conversation. If a message-id is not needed for a particular message, a period '.' takes its place. A message-id field must be unique for the port on which the message is sent. The easiest way to achieve this is simply to use a sequence number generated by the sender process. This identifier is quoted in a response to the message. Note that the 'your-ref' and 'my-ref' are different for each hop; each intermediate node allocates a 'my-ref' of its own for messages that it forwards or sends, and remembers this together with the 'your-ref' it received, and the connections involved, so it can route the reply correctly. Note that a message-id is not universally unique, only unique for that particular source (or connection, from the point of view of the intermediate node). Connection Management: Starting a mi_hub: When a mi_hub is started, it is given a TCP port to listen for connections, and the TCP/IP port/address of any other mi_hubs machines which are already running, to which it should establish a connection. It connects to each of these mi_hubs it turn, using TCP, and sends a 'hello_hub' message, which contains the current system time (seconds since the epoch, UTC). If the clocks are synchronized, a 'welcome' message is returned. The peer mi_hub will return an 'error' message unless the system times on the two machines agree within a very small margin (perhaps 5 seconds). This synchronization is important for the transfer of cache expiry times. The connection will not be closed automatically, and another welcome message may be sent after calling 'rdate' or some such program. Here are the three types of message: hello_hub . 0 time: 954119823 . error 0 . code: 1 message: clocks are not synchronized . welcome 0 . . Starting a mi_mux: When a mi_mux is started, it is given the name of its deployment, and the TCP/IP port/address for its mi_hub (which should already be running). It connects to this hub using TCP, and sends a 'hello_mux' message, which contains the deployment name, a list of the customers in the deployment, and the current system time. If it cannot connect, it consults a list of other hubs, which it was given in a previous session, and connects to one of these. The reply may be an 'error' or 'welcome'. The 'welcome' is shortly followed by a 'hubs' message (see below). Here are is the 'hello_mux' message: hello_mux . 0 deployment: vic customers: moeps aberfeld charlton adm5 alexsc alfredps altonaps ... time: 954119830 . The mi_hub will then send a 'add_server' message to all of the other mi_hubs: add_server . . deployment: vic server: adm1 customers: moeps aberfeld charlton adm5 alexsc alfredps altonaps ... Starting a mid: When a mid is started, it is given the name of the socket through which it should connect to its local mi_mux (which should already be running). The procedure similar to the above, the mid sends a 'hello_mid' message, and the mi_mux responds with 'welcome' or 'error': hello_mid . 0 time: 954119835 Stopping a daemon: The system is designed to avoid the need for termination messages, because this makes it much more stable in the event of the accidental termination of a machine, or of network failure. When a mid stops, all it needs to do is disconnect, it need not send any special termination messages. When a mi_mux stops, any mid processes remaining on that machine will no longer be able to communicate with the network (typically, the parent mid should detect the problem and restart the mi_mux daemon). The associated mi_hub must detect the situation and send a 'remove_server' message to all other mi_hubs. The mi_mux need not send any special termination messages. remove_server . . deployment: vic server: adm1 When a mi_hub stops, the mi_mux daemons on its MI_servers must connect to another hub. The other hubs must remove entries related to this hub in their routing tables. The mi_hub need not send any special termination messages. Network Topology: Sometimes it is necessary for a mi_mux on a MI_server to use a mi_hub machine other than its default, if one machine is overloaded or unreachable. After a mi_hub sends 'welcome' to a mi_mux, it then sends a 'topology' message, containing a list of a hubs that are currently active, in an appropriate order of preference for that mi_mux. This message is also sent to every mux whenever a hub starts or stops. The mi_mux saves the list in a configuration file. If the mi_mux is ever unable to connect to its default mi_hub server, or it loses its connection, it can use the list of all known mi_hub servers, which was saved previously, and attempt to connect to each in turn. In this case, or if a hub becomes overloaded, the hub may send a 'reconnect' message, asking the mux to disconnect and reconnect to another (specified) mi_hub. The mux does not disconnect immediately; it buffers requests from its mids, and waits until all active requests have been resolved before continuing. It will not receive any further requests from the hub, but will receive any outstanding replies. It may be that the mux can connect to both hubs at once, and thus smooth the transition. topology . . hubs: 192.168.1.92:2345 192.168.1.93:2345 192.168.1.94:2345 . reconnect . . hub: 129.168.1.92:2345 . Request and Response: When a mid requires a remote object, it sends a request to its mi_mux. It may send an optional timeout, after which the attempt should be aborted. There are three types of request - 'object', 'user' and 'ACL': An request of type 'object' has an 'object_DCMI' field, which contains the deployment, customer, module and instance required. The 'requester_DCGL' field specifies the deployment, customer, group and login of the user for whom the request is being made. A request of type 'user' has an 'user_DCL' field, the deployment, customer and login of the user record. A request of type 'ACL' has an 'ACL_DC' field, the deployment and customer of the ACL. The 'timeout' field is optional, and defaults to a generous value, perhaps 30 seconds; this default is the maximum timeout. Here are the three types of request: request . 1 type: object identifier: vic moeps MiniForum 1004 requester: snc mi teaching margo timeout: 10 request . 2 type: user identifier: vic moeps bert timeout: 10 request . 3 type: ACL identifier: vic charlton (?) timeout: 10 A request is forwarded without change, excepting the 'my_ref' field, to a foreign mid (or mi_hub with a matching cache entry) which can provide the necessary information. This daemon then sends a 'response' message, which may include a 'cache_until' field if it is cacheable. The form of the responses to these requests is: response 1 . cache_until: 954119955 body_length: 411 . [binary data here, 411 bytes, then the next message starts] The reply is forwarded without change, excepting the 'your_ref' field, to the original mid that made the request. If the object is cacheable, and it passes through one or more intermediate mi_hub machines, then it will be cached on those machines. Since the response does not contain the information needed to key the cache, this information is kept associated with the session in each intermediate mi_hub. If a foreign mi_mux is unable to service a request, because all of its mids are busy, it does not wait, but returns a 'busy' message immediately. The hub can then try other MI-servers, or wait and try again later. busy 2 . . It may be that the request cannot be serviced, in which case an 'error' message will be returned from the mi_hub. Cache Management: Under some circumstances it may be necessary to clean some or all records from the cache. If a new message is posted to a noticeboard, all cached images of the noticeboard must be flushed so that remote users can immediately see the updated version. The 'cache_clean' message cleans a range of keys, as specified by the 'key' field, from every cache in the system, including caches on hubs and MI-servers. It may be that information is recorded to prevent 'cache_clean' messages being sent to nodes that have not cached the object in question, however this might not necessarily improve performance. cache_clean . 12 key: MiniForum 1036 * * * * .