Elite software development.

Fast Multicast IPC on Linux and Unix

IPC frameworks on Linux come in two varieties: complicated and nonexistent. This article presents the design for a very fast IPC framework that uses shared memory and supports multicast messaging. The design is flexible and not complicated.

In response to the complexity and performance of the D-Bus IPC framework, my employer, Collabora, asked me to investigate options for improving it.

As part of that investigation, as proof-of-concept, I have created this faster, less complicated design. I called it IPC.FM - Fast Multicast IPC.

My hope is that this design is used as a base on which more feature-rich IPC frameworks are built. Possible uses include:

  • as a low-level component of an improved D-Bus
  • as a base for a faster ZeroMQ IPC layer
  • as-is in software that needs only what is provided by this design.

My source code is open, subject to the Apache License, but the system's design does not bind you to any one implementation of the design.

This IPC framework is as simple as I could imagine making it. So, it is easy to learn. Also, implementations should suffer from fewer bugs and lower maintenance costs than more complicated IPC frameworks.

Features

  • Has a very fast message transfer mechanism based on shared memory.
  • Allows messages to be sent to multiple recipient processes. (Multicast messaging.)
  • Uses (Unix domain) sockets for its meta-message protocol.
  • Not bound to any particular programming language.
  • No central daemon slows message delivery nor causes priority inversion.

It does not provide:

  • Guarantees about the ordering of message delivery.
  • A serialization protocol. (Use Google's Protocol Buffers or Apache's Thrift or similar.)
  • Privacy between processes owned by the same user.
  • A way for a process to discover other processes interested in subscribing to its messages.
  • A central daemon to validate message delivery.
  • A protocol for the authentication of the participants in a message exchange.
  • A way to exchange messages across a network.

Design

There is a base directory on the system that is fixed and known in advance, which I call $shmipc_base. The default value could be '/var/run/shmipc/'.

Each participating process ($proc) has its own directory within $shmipc_base which is derived from its process identifier (pid). I call this $proc_base. For example, for pid 1523, the $proc_base would be '/var/run/shmipc/1523/'.

Note that $proc_base need not necessarily be readable by anyone. Though, it must have Unix execute permissions, at least.

Inside the $proc_base, $proc must create a filesystem entry called 'in'. I call this $proc_in. For pid 1523, the $proc_in would be '/var/run/shmipc/1523/in' (to continue the previous example).

$proc_in is used to communicate message availability. It is likely to be a named socket.

Either directly in $proc_base, or in a subdirectory, $proc must create at least one file which will be memory-mapped by other processes. I call this file $proc_shmout. This is the shared memory that is used for messages.

To send a message, a process must first place the contents of the message into its own $proc_shmout. Then, the process must notify the recipients via their $proc_ins.

$proc_in is writable. Processes that want to communicate with $proc can write a short meta-message to its $proc_in. The meta-message identifies the sender's pid, its shared memory file (its $proc_shmout), the offset of the sent message within that file, and the length of the message.

$proc is free to ignore the meta-messages received on $proc_in. Any sent message is only guaranteed to be available in the shared memory for 1 second.

If $proc chooses to receive the message, it must send a (meta-message) reply indicating that it has processed the message. The reply is sent to the $proc_in of the process that sent the message. Failure to send the reply will mean that the sending process no longer notifies $proc of future messages, or some similar penalty.

Here are definitions of the only meta-messages that can be sent in this protocol.

struct Msg {
   word8  this_msg_length;
   word32 sender_pid;
   word32 shm_msg_offset;
   word32 shm_msg_length;
   bytes0 sender_shmout_name;
}
struct Reply {
   word8 0;
   word32 replier_pid;
   word32 offset;
}

Here, 'word8' means an 8-bit unsigned type; 1 byte. The 'word32' type means 4 bytes ordered with whatever endianness is native to the platform. 'bytes0' means a 0-terminated sequence of bytes.

Meta-messages which don't make sense are ignored. This includes truncated meta-messages.

This design intentionally leaves certain IPC-related problems unsolved. For example, how can a process avoid wasting time reading meta-messages from $proc_in which are written by a malicious process bent on wasting $proc's time? I hope to respond to other concerns in a future blog post.

Also in a future blog posts: a tour of the source code, a walkthrough of the connection process, a walkthrough of the (multicast) message delivery process.