Elite software development.

An Embedded DSL for Building Mach-O Binaries, Part 1

This article is the first in a series which describes the creation of an elegant domain-specific language (DSL) for manipulating Mach-O files. Mach-O is the native format for executable binaries on Mac OS X. The DSL is embedded in Haskell.

This is Part 1 of the article. It introduces the Mach-O format, and explains the thought process involved in designing the DSL.

The first step in the creation of a DSL is to understand the problem domain. All good solutions require understanding the problem well. In order to understand the DSL constructed in this article, one must understand the Mach-O file format and how it is used.

I assume that you understand dynamic linking and loading - how an executable file is turned into executing code.

Summary of The Mach-O Format

Mach-O is a file format for executable code. Mac OS X uses the Mach-O format for:

  • kernel extensions,
  • command-line tools,
  • applications,
  • frameworks,
  • shared and static libraries,
  • object (.o) files!

So, the Mach-O file format is important!

The precise format of the Mach-O file is specified in the Mac OS X ABI Mach-O File Format Reference. Here, I summarize and give highlights.

A Mach-O file has the following 3 regions, in order:

  1. the header
  2. the load commands
  3. the segment data

The header contains information about the file, such as the target architecture. It contains flags which specify how to interpret the remainder of the file. The target architecture determines the byte ordering (endianness) for many of the structures in the file.

The second region contains the load commands. They describe to the linker/loader how to construct the address space of the executable code, where in the Mach-O file to find the code, and what other files (shared libraries) contain code used by the file.

The third region contains the data described by the load commands. The region is partitioned into one or more segments, which are in turn partitioned into sections.

Beginning the DSL

Already, we have enough information to begin to draft our DSL in Haskell. Of course, this is merely a draft, as we do not yet have a full understanding of Mach-O file format and use patterns.

[Implementation]
module MachO where

data MachO =
   MachO { machoHeader   :: Header
         , machoLCs      :: LoadCommands
         , machoSegments :: SegmentData
	 }

With code as above, we must expect to use the (so far very small) DSL like so:

[DSL use]
MachO
   (header)
   (loadCommands)
   (segmentData)

The above Haskell expression constructs a MachO data value, assuming that each of its three component values, header, loadCommands, and segmentData, are defined appropriately.

Problems

Note that we are being top-down in our thinking about the MachO type. Is this necessarily a good idea?

What coding patterns will the programmer use to define the three components of the MachO? What will the programmer do with the MachO? Probably, it is only useful to serialize it, perhaps to a file. How will this serialization occur? Indeed, the file format is very strict about the layout of the bytes in the file.

Does the code exhibit any useful features of a DSL, or is it far too Haskell-ish? Isn't controlling what data is placed in what segments and sections, as well as controlling the alignment and address space placement of those sections, what a programmer is most interested in? If so, we might expect our DSL to be segment/section-centric, and not simply to be a set of Haskell data types. Don't we need a more expressive DSL?

I shall answer these questions in the next parts of this article. Continue reading with Part 2, when it is written.

References