EDTs (Engineering Data Types) used in CS734 LM project: LL1, LL2, HT

Betty O'Neil and Pat O'Neil, October, 2005

Data Structures in C that fit together nicely

C is a low-level language, but one that is used for operating system and database system implementation because of its high performance.  Like a racecar, it’s harder to use but what you need for the fastest performance.

 

We have engaged in a project to create C data structures that "fit together like tinker-toys". Thus, for example, we can use an array to hold in its cells the HDR of a Chain-Linked List, or LL (either a 2-way LL, named LL2 as an EDT, or a one-way LL named LL1). These LLs are collection types, and they are identified and anchored by their HDR structs. A pointer to a HDR struct works like an object reference to the whole collection object, that is, and the HDR itself contains the pointer to starting element on the list, etc.  The elements on the list can have any C struct type.

 

Although we do not deal with it in this short note, we also have a HT (hash table) edt that allow us to use hashing to locate the lock manager dids and tids in particular cells of the arrays that themselves contain HDRs of LLs  for wait queues and  lists of locks belonging to a tid.

Header structs and element structs for a list

To repeat, pointers in structs are used to hang structs together in a LL or other data structure.  A LL has a “header” struct (HDR), that holds various information about the overall LL, and a pointer to the first element struct on the LL (and perhaps the final element struct on the LL).  Each element struct has a pointer to the next element on the LL, and the last element has a null pointer or pointer back to the header struct, depending on the implementation details.

Multiple “next” pointers are needed sometimes

An element can be on more than one LL.  Suppose an element is on LLA, identified by a pointer to its header, and listB, similarly identified.  Then the element struct needs to have a pointer to the next element on LLA, and another pointer for the next element on LLB.  For example, each LCB is on a waitq and also a txlist.  So each LCB needs to have a pointer for the next LCB on the waitq for this did, and another for the next LCB on the txlist for this tid.

 

How can we manage multiple next-pointers in an element struct? 

The idea of next-pointer offsets.

We would like to have a general purpose implementation for LLs, and treat LLA and LLB as simply different LL objects.  One solution is to parametrize the position of next pointers in the element struct.  Suppose the next pointer of LLA is at an offset of 0 from the beginning of the element struct and the next pointer of LLB is at byte offset 8.

Next-pointer offsets and portability.

When you port code with pointers between 32 and 64-bit platforms, pointers themselves become fatter (64 bits instead of 32) and that causes the next-pointer byte offsets to change.  Thus it’s best to devise a scheme that makes C compute the offsets from the actual struct layouts.  This can be done, as detailed at the end of this document.  Offsets are measured in pointer-sized units, so an offset of 2 means 16 bytes on a 32-bit platform and 32 bytes on a 64-bit platform.

Ability to “corner” from one LL to another at an element

With the offset information at hand, the EDT code can do things that ordinary collection libraries can’t do.  For example, you can scan a txlist to a particular LCB, and then “corner” onto the waitq that goes through that LCB, and scan down that LL.

Need for a type information struct

Because of the needed offset calculations, there is an additional initialization step needed to use EDTs.  Here is the code from lm.c to make the waitqs.  First we call on maketype_ll2 to set up lm.waitqtype, the type information struct which holds the offset of waitqs in the LCB struct.  maketype_ll2 is actually a macro, not a function, to allow passage of type information, here the type LCB and its member name waitqinfo.  Then lm.waitqtype, the type information struct, is passed back to make_ll2, along with a pointer to a LL-header spot, which gets filled in by make_ll2, itself a function.  We end up with a waitq LL for each did, ready for inserts, etc.

 

  maketype_ll2(LCB,waitqinfo,&lm.waitqtype);

  /* loop thru HDR arrays, initializing them  */

  for (i=0; i<MAXDIDS; i++) {

        make_ll2(&lm.waitqtype, &lm.didHDR[i].waitqhdr);

  }

Data Structure Layout

To set up a particular type of LL, we need to work with

·    one type information struct

·    multiple header structs, one for each LL of that type

·    a struct inside each app element struct, for the next pointer.

 

For LL2 LLs, doubly-linked LLs, these LL2 structs are typedef’d in ll2.h as:

·    LL2type

·    LL2

·    LL2info, inside say MyLLElementStruct

 

(Then to *use* the resulting LL, we need a "handle", of type  LL2hand)

 


 

A minimal example is a LL of ints (set up only):

 

#include “edt.h”

#include “ll2.h”

/* set up an element struct with an LL2info struct inside it */

typedef struct myelement {

  LL2info myinfo;  /* put the “info” struct(s) first in the element */

  int x;

} MyElement;

 

int main()

{

  LL2type mytype;  /* type information struct */

  LL2 myLL;      /* LL header struct */

 

  maketype_ll2(MyElement, myinfo, &mytype); /* fill in mytype */

  make_ll2(&mytype, &myLL);  /* fill in myLL */

  /* now ready to insert an int-holding element into myLL */

}

Working with LLs, once they are set up

A basic idea of LLs is that of position in the LL.  In Java, we use a Iterator to hold the current position in a LL.  Here, the “handle” has the same purpose.  However, some details are different.

 

Each sequence of N elements has N+2 positions:  BOL (beginning-of-LL, before any elements), positions at each element, and EOL (end-of-LL, after all elements.)  The position at the first element is called HOL, for head-of-LL.

 

         first_element  ...    last_element

     ^                  ^^^           ^       ^

    BOL     HOL                              EOL

 

Thus in particular, an empty LL has 2 positions, BOL and EOL. 

(This is different from a Java-library Iterator, which has N+1 positions for N elements: before-first, between first and second, between second and third, ..., after-last.  An empty Java LinkedList has only one Iterator position, both before and after all its elements.)

 

Scans of sequences

To do a scan of a sequence, initialize a handle at BOL and then do a loop of next’s until next returns NULL.  For example, with LL1

 

inithandle_ll1(&ll_hdr, BOL, &ll_hand);

while (p = next_ll1(&ll_hand)) { ... }

 

Each value of p will be a pointer to the element inside the LL.  No element copying occurs here.  This is similar to the Java Collections treatment of LinkList iteration, where object references are provided to objects within the collection.


Inserts in sequences

There are N+1 possible spots in which to insert a new element in a sequence.  If a handle is positioned at a certain element, an insert using it as current will put the new element in the position before the current element.  To insert an element at the end of a sequence, either set up a handle positioned at EOL and then insert at the current position, or specify EOL in the insert itself:

 

inithandle_ll1(&ll_hdr, EOL, &LL_hand);

insert(&LL_hand, &elt, CURR);

 

or  (EOL in insert overrides BOL of handle)

 

inithandle_ll1(&ll_hdr, BOL, &LL_hand); 

insert(&LL_hand, &elt, EOL);

 

Note that insert doesn’t copy the element, but instead, like Java Collections, it copies the element pointer, a lightweight action.  This means that you have to be careful about creating the elements appropriately.  A second element has to reside in different memory from an old first element already in the LL.

Deletes in sequences

There are N possible spots for deletion, one for each element.  The element to be deleted is the one selected by the handle, or in the delete itself (HOL specified in the delete call overrides the position of the handle.)  Just as the insert only copies in a pointer, the delete just copies out the pointer, leaving the job of element memory deallocation to the caller.  Note that if an element is on more than one LL, it shouldn’t be destroyed until it is deleted from all its LLs.  Keeping track of this is up to the application.

Getting C to calculate struct offsets.

Consider the simple struct

 

struct point {

  int x, y, z;

} pt;

 

Then &pt points to the whole struct, and so does &pt.x (this is guaranteed in C), while &pt.y points to the y member inside pt and &pt.z points to the z member.  We can subtract pointers of the same type, so for example &pt.y - &pt.x is 1 and &pt.z - &pt.x is 2, since the types are pointer-to-int.  This is saying that there is one int spot between &pt.x and &pt.y, for example.  To get the distances in bytes, we could cast to char * before subtracting, e.g., (char *)&p.y - (char *)&p.x = 4 on platforms like ours where sizeof(int) = 4.