4. The DSI Kernel

DSI's kernel differs substantially from conventional operating system and parallel language kernels due to its particular orientation toward fine-grained symbolic processing. The differences in processing requirements is reflected in the emphasis or deemphasis of certain features. For example, a finer process grain size leads to greatly increased emphasis on the efficiency of process manipulation. Suspension grain size is much smaller than in conventional operating systems; smaller than kernel-supported threads in conventional processes, and even smaller than that of many symbolic languages such as Multilisp, where grain size is specified by the programmer. Process manipulation overhead (creation, scheduling, synchronization, context switching) is therefore significantly more critical to system performance than in conventional systems, where it is typically a fraction of overall computation effort. This is reflected in DSI's use of context windows and demand driven scheduling traps in the virtual machine.

Another major difference is in virtual memory support, or lack thereof. Studies of the interaction between virtual memory and heap-based, symbolic languages [Moo84] reveal that the large memory requirements of symbolic languages plus the non-locality of heap allocation and garbage collection do not mesh well with conventional virtual memory systems which depend heavily on spatial locality. The development of generational garbage collection has reduced the problem somewhat [PRWM92], but the issue is still a topic of active research. Lazy languages such as Daisy may have even worse virtual-memory locality than applicative-order languages such as Scheme, which can make heavy use of stack frames. In any case, the task granularity of symbolic processes is much too fine to associate with virtual memory table flushes on every context switch.

Another reason for a deemphasis on virtual memory is that it is often used in conventional systems to support protected address spaces. In symbolic processing the dominant communication paradigm is shared memory; i.e. shared memory is the norm, rather than the exception. There has been some work in using shared virtual memory to support type-checking, memory barriers and other features needed by symbolic languages (e.g. [AL91]). This approach exploits the trapping aspect of memory management units on stock hardware for purposes that might also be handled by appropriate processor modifications.

Finally, symbolic languages do not have visible pointers as traditional languages do. Memory protection is accomplished at the language level by the simple fact that if you do not have a reference to an object you cannot modify it, even in systems with side-effects. Of more concern is the issue of conflicting global name-spaces. Most systems have some notion of a global namespace for identifiers. On systems with simple shallow binding or similar schemes, this leads to shared identifier references; on a concurrent system this is not always what is actually desired, especially for concurrent users. This problem might be addressed at the language level with a suitably designed module or package system, as opposed to the use of protected address spaces.

4.2 Kernel Structure

4.2.1 Monolithic vs. Micro Kernel

Many modern operating systems are structured so that most system services are handled outside of the kernel by special user-mode processes. The resulting stripped-down microkernel provides only the absolute core supervisor-mode functionality required to manage the machine resources: processor scheduling, memory management, and interrupt handling. The microkernel itself is implemented as a separate process to which ordinary processes make requests for services, not as a special code layer in processes' own address spaces. This provides certain advantages over the traditional kernel implementation model, such as increased robustness, distributability, and concurrency in the operating system itself. A microkernel design results in a modular decomposition of operating system functionality into a few separate processes that operate interdependently to manage the system as a whole. This modularity greatly simplifies understanding, extending and debugging system-level code.

Many symbolic language kernels share aspects of the monolithic design; they may not be nearly as large and they may use very different resource management techniques, but they still implement the kernel as a low-level code layer interfaced through the stack of every process. In contrast, DSI borrows the microkernel design philosophy in the structure of its own kernel (we will hereafter not distinguish between the terms kernel and microkernel when referring to DSI's kernel). DSI's kernel is implemented as a set of special processes, not as layers of code accessed through a stack interface. Interfacing with the kernel is a kind of interprocess communication. This is not nearly as expensive as in conventional IPC, since

4.2.2 Distributed vs. Non-Distributed

Master-slave arrangements are common in systems that do not use symmetric multiprocessing; i.e. in a processor arrangement in which one processor controls the others. In this case the "master" processor might run the kernel and distribute work to the slave processors. The advantage of this kind of design is that no locks or special synchronizations are needed for the kernel, since only one processor is running it. In symmetric multiprocessing systems all processors have the same functionality and capabilities. This kind of system encourages a distributed kernel design where all processors have kernel functionality, an approach that requires more care in determining how the processors interact, but pays off in greater parallelism in the kernel itself.

Although this issue is somewhat orthogonal to the issue of macro vs. microkernel, the two issues impact one another. If a distributed kernel is combined with a monolithic kernel design, the processors may need to use shared locks and other measures to arbitrate access to shared kernel structures. A microkernel design allows the kernels to use interprocess communication to inter-operate with each other, resulting in a more loosely-coupled design that is easier to scale and results in less bottlenecks. Many parallel symbolic processing kernels use a distributed design. The macrokernel approach of most of them is reflected in in the use of locks to control access to global allocation pointers, shared schedules, and other shared kernel structures. DSI uses a distributed design in which the kernel processes are distributed across all processor nodes. There are no shared kernel structures; the only synchronization required is to the queues of each processor's message area. The kernels communicate with each other through these queues to manage the machine as a whole.

4.2.3 Kernel Organization

4.3 The Kernel Interface

4.3.1 Message Requests

Message communication is handled with streams, a natural choice for communication in DSI. There are two parts to a message request: appending the message to the appropriate stream and (optionally) signaling the processor that a message of the appropriate priority has been sent. The signaling part may not be used for messages that are routinely handled, such as allocation requests. This approach is used both for local and remote kernel requests.

In normal stream communication under DSI there may be many readers but only one writer^4.2; the only synchronization required is an atomic store operation, which is provided by the virtual machine. With message requests, there are multiple writers and only one reader; namely, the kernel handling the requests. Thus, some synchronization is required to arbitrate access between processors to the tail of the communication streams for the purposes of appending messages. Note that since message streams are distributed over the processors (each processor has several message streams), synchronization efficiency is not overly critical, and simple spin locks will suffice to arbitrate access among writers.

4.3.1.1 Implementation

Each queue of a processor's set of message queues is associated with a signal (see table 1). After appending a message, the sender signals the processor with the signal associated with that queue. The priority of the signals assigned to queues establishes the priority of the queues themselves; messages in a higher priority queue will always be serviced before messages in the next lower priority queue, and so forth. This provides a way to prioritize various types of requests; see chapter 6 for an example of how this is used.

4.3.2 Traps

4.3.3 Interruption

Signal Name	Type	Description	Handler
`SIG_EXIT`	sw	Exit signal.	Supervisor
`SIG_ABORT`	sw	Abort signal.	Tracer
`SIG_TRACE`	sw	Tracing.	Tracer
`SIG_PROBERR`	hw	Invalid probe.	Supervisor
`SIG_GC`	sw	Garbage collection.	Garbage Collector
`SIG_GC_DUMP`	sw	Dump heap.	Garbage Collector
`SIG_HDC`	hw	Conditional head.	Supervisor
`SIG_TLC`	hw	Conditional tail.	Supervisor
`SIG_RESET`	hw/sw	Boot/reboot.	Supervisor
`SIG_SYNC`	sw	Synchronize nodes.	Supervisor
`SIG_TIMER`	hw	Interval timer.	Supervisor
`SIG_IO`	hw	An input event.	Device Manager
`SIG_IOBLOCK`	hw	I/O blocked.	Supervisor
`SIG_DETACH`	sw	Detach request.	Supervisor
`SIG_QUEUE8`	sw	Message in queue 8.	Supervisor
`SIG_QUEUE7`	sw	Message in queue 7.	Supervisor
`SIG_QUEUE6`	sw	Message in queue 6.	Supervisor
`SIG_QUEUE5`	sw	Message in queue 5.	Supervisor
`SIG_QUEUE4`	sw	Message in queue 4.	Supervisor
`SIG_QUEUE3`	sw	Message in queue 3.	Supervisor
`SIG_QUEUE2`	sw	Message in queue 2.	Supervisor
`SIG_QUEUE1`	sw	Message in queue 1.	Supervisor