C is a low-level language,
but one that is used for operating system and database system implementation
because of its high performance. Like a
racecar, it’s harder to use but what you need for the fastest performance.
We
have engaged in a
project to create C data structures that "fit together like
tinker-toys". Thus, for example, we can use an array to hold in its
cells
the HDR of a Chain-Linked List, or LL (either a 2-way LL, named LL2 as
an
EDT, or a one-way LL named LL1). These LLs are collection types, and
they are identified and anchored by their HDR structs. A pointer to a
HDR struct
works like an object reference to the whole collection object, that is,
and the HDR itself contains the pointer to starting element on the
list, etc. The elements on the list can have any C struct type.
Although we do not
deal with it in this short note, we also have a HT (hash table) edt
that allow us to use hashing to locate the lock manager dids and tids in particular cells of the arrays that
themselves contain HDRs of LLs for wait queues and lists of locks belonging to a tid.
To repeat, pointers in
structs are used to hang structs together in a LL or other data
structure. A LL has a “header” struct
(HDR), that holds various information about the overall LL, and a pointer to
the first element struct on the LL (and perhaps the final element struct on
the LL). Each element struct has a
pointer to the next element on the LL, and the last element has a null pointer
or pointer back to the header struct, depending on the implementation details.
An element can be on
more than one LL. Suppose an element is
on LLA, identified by a pointer to its header, and listB, similarly identified. Then the element struct needs to have a
pointer to the next element on LLA, and another pointer for the next element
on LLB. For example, each LCB is on a
waitq and also a txlist. So each LCB
needs to have a pointer for the next LCB on the waitq for this did, and another for the next LCB on the txlist for this
tid.
How can we manage
multiple next-pointers in an element struct?
We would like to have
a general purpose implementation for LLs, and treat LLA and LLB as simply
different LL objects. One solution is
to parametrize the position of next pointers in the element struct. Suppose the next pointer of LLA is at an
offset of 0 from the beginning of the element struct and the next pointer of LLB is at byte offset 8.
When you port code
with pointers between 32 and 64-bit platforms, pointers themselves become fatter (64 bits instead of 32) and that causes the
next-pointer byte offsets to change.
Thus it’s best to devise a scheme that makes C compute the offsets from
the actual struct layouts. This can be
done, as detailed at the end of this document.
Offsets are measured in pointer-sized units, so an offset of 2 means 16
bytes on a 32-bit platform and 32 bytes on a 64-bit platform.
With the offset
information at hand, the EDT code can do things that ordinary collection
libraries can’t do. For example, you can
scan a txlist to a particular LCB, and then “corner” onto the waitq that goes
through that LCB, and scan down that LL.
Because of the needed
offset calculations, there is an additional initialization step needed to use EDTs. Here is the code from lm.c to make the
waitqs. First we call on maketype_ll2 to
set up lm.waitqtype, the type information struct which holds the offset of
waitqs in the LCB struct. maketype_ll2 is actually a macro, not a function, to allow
passage of type information, here the type LCB and its member name
waitqinfo. Then
lm.waitqtype, the type
information struct, is passed back to make_ll2, along with a pointer to
a LL-header spot, which gets filled in by make_ll2, itself a function. We end up with a waitq LL for each did,
ready for inserts, etc.
maketype_ll2(LCB,waitqinfo,&lm.waitqtype);
/* loop thru HDR arrays, initializing them */
for (i=0; i<MAXDIDS; i++) {
make_ll2(&lm.waitqtype, &lm.didHDR[i].waitqhdr);
}
To
set up a particular type of LL, we need to work with
· one type information struct
· multiple header structs, one
for each LL of that type
· a struct inside each app element struct,
for the next pointer.
For LL2 LLs,
doubly-linked LLs, these LL2 structs are typedef’d in ll2.h as:
· LL2type
· LL2
· LL2info,
inside say MyLLElementStruct
(Then to *use* the resulting LL, we need a
"handle", of type
LL2hand)
A minimal example is a LL of ints (set up only):
#include “edt.h”
#include “ll2.h”
/* set up an element struct with an LL2info struct inside it */
typedef struct myelement {
LL2info myinfo; /* put the “info” struct(s) first in the element */
int x;
} MyElement;
int main()
{
LL2type mytype; /* type information struct */
LL2 myLL; /* LL header struct */
maketype_ll2(MyElement, myinfo, &mytype); /* fill in mytype */
make_ll2(&mytype, &myLL); /* fill in myLL */
/* now ready to insert an int-holding element into myLL */
}
A basic idea of LLs
is that of position in the LL. In Java,
we use a Iterator to hold the current position in a LL. Here, the “handle” has the same
purpose. However, some details are
different.
Each sequence of N
elements has N+2 positions: BOL
(beginning-of-LL, before any elements), positions at each element, and EOL
(end-of-LL, after all elements.) The
position at the first element is called HOL, for head-of-LL.
first_element ... last_element
^ ^^^ ^ ^
BOL HOL EOL
Thus in particular, an
empty LL has 2 positions, BOL and EOL.
(This is different
from a Java-library Iterator, which has N+1 positions for N elements: before-first,
between first and second, between second and third, ...,
after-last. An empty Java LinkedList has only one
Iterator position, both before and after all its elements.)
To do a scan of a
sequence, initialize a handle at BOL and then do a loop of next’s
until next returns NULL. For example,
with LL1
inithandle_ll1(&ll_hdr, BOL, &ll_hand);
while (p = next_ll1(&ll_hand)) { ... }
Each value of p will
be a pointer to the element inside the LL.
No element copying occurs here.
This is similar to the Java Collections treatment of LinkList iteration,
where object references are provided to objects within the collection.
There are N+1 possible
spots in which to insert a new element in a sequence. If a handle is positioned at a certain
element, an insert using it as current will put the new element in the position
before the current element. To insert an
element at the end of a sequence, either set up a handle positioned at EOL and
then insert at the current position, or specify EOL in the insert itself:
inithandle_ll1(&ll_hdr, EOL, &LL_hand);
insert(&LL_hand, &elt, CURR);
or (EOL in insert
overrides BOL of handle)
inithandle_ll1(&ll_hdr, BOL, &LL_hand);
insert(&LL_hand, &elt, EOL);
Note that insert
doesn’t copy the element, but instead, like Java Collections, it copies the
element pointer, a lightweight action.
This means that you have to be careful about creating the elements
appropriately. A second element has to
reside in different memory from an old first element already in the LL.
There are N possible
spots for deletion, one for each element.
The element to be deleted is the one selected by the handle, or in the
delete itself (HOL specified in the delete call overrides the position of the
handle.) Just as the insert only copies
in a pointer, the delete just copies out the pointer, leaving the job of
element memory deallocation to the caller.
Note that if an element is on more than one LL, it shouldn’t be
destroyed until it is deleted from all its LLs. Keeping track of this is up to the
application.
Consider the simple
struct
struct point {
int x, y, z;
} pt;
Then &pt points to
the whole struct, and so does &pt.x (this is guaranteed in C), while
&pt.y points to the y member inside pt and &pt.z points to the z
member. We can subtract pointers of the
same type, so for example &pt.y - &pt.x is 1 and &pt.z - &pt.x
is 2, since the types are pointer-to-int.
This is saying that there is one int spot between &pt.x and
&pt.y, for example. To get the
distances in bytes, we could cast to char * before subtracting, e.g., (char *)&p.y - (char *)&p.x = 4 on platforms like ours
where sizeof(int) = 4.