source: git/doc/memory/OMALLOC.texi @ 780081

spielwiese
Last change on this file since 780081 was 780081, checked in by Oleksandr Motsak <motsak@…>, 13 years ago
Description of omalloc
  • Property mode set to 100644
File size: 14.7 KB
Line 
1\input texinfo    @c -*-texinfo-*-
2@setfilename omalloc.hlp
3
4@comment -*-texinfo-*-
5@comment $Id$
6@comment Unix
7
8@majorheading An Overview of @code{omalloc}
9@cindex omalloc
10@cindex omalloc, overview
11
12This document gives a short overview of @code{omalloc}, the internal
13memory manager of @sc{Singular}.
14
15@heading Requirements of a memory manager for @sc{Singular}
16
17Most of @sc{Singular}'s computations boil down to primitive polynomial
18operations like copying, deleting, adding, and multiplying of
19polynomials. For example, standard bases computations over finite fields
20spent (on average) 90 per cent of their time realizing the operation @code{p - m*q} where @code{m} is a monomial, and @code{p,q} are polynomials.
21
22@sc{Singular} uses linked lists of monomials as data structure for
23representing polynomials. A monomial is represented by a memory block
24which stores a coefficient and an exponent vector. The size of this
25memory block varies: its minimum size is three machine words (one word
26for each, the pointer to the "next" monomial, the coefficient  and the
27exponent vector), it's maximal size is (almost) unlimited, and
28we can assume that its average size is 4 to 6 machine words (i.e., the
29exponent vector is 2 to 4 words long).
30
31From a low-level point of view, polynomial operations are operations on
32linked lists which go as follows:
33@enumerate
34@item
35Do something with the exponent vector (e.g., copy, compare, add), and
36possibly do something with the coefficient (e.g., add, multiply, compare
37with zero) of a monomial.
38@item
39Advance to the next monomial and/or allocate a new or free an existing
40monomial.
41@end enumerate
42
43Assuming that coefficient operations are (inlined) operations on a
44single machine word (as they are, for example, for coefficients from
45most finite fields), and observing that the exponent vector operations
46are linear w.r.t. the length of the exponent vector, we come to the
47following major conclusion:
48
49@quotation
50@emph{For most computations, a traversal to the next memory block or a
51allocation/deallocation of a memory block is, on average, required after
52every 4 to 6  machine word operations.}
53@end quotation
54
55The major requirements of a memory manager for @sc{Singular} become
56immediately clear from this conclusion:
57
58@table @strong
59@item (1) allocation/deallocation of (small) memory blocks must be extremly fast
60If the allocation/deallocation of a memory blocks requires more than a
61couple of machine instructions it would become the bottleneck of the
62computations. In particular, even the overhead of a function call
63for a memory operation is already too expensive.
64@item (2) consecutive memory blocks in linked lists must have a high locality of reference
65Compared with the performed operations on list elements, cache misses
66(or, even worse, page misses) of memory blocks are extremly expensive:
67we estimate that one cache miss of a memory block costs at least ten
68or more list element operations. Hence, if the addresses of consecutive
69list elements are not physically closed to each other (i.e., if
70the linked lists have a low locality of reference), then resolving
71chache (or, page) misses would become the bottleneck of computations.
72@end table
73
74Furthermore, there a the following more or less obvious requirements on
75a memory manager for @sc{Singular}:
76
77@table @strong
78@item (3) the size overhead to maintain small blocks of memory must be small
79If managing one memory block requires one or more words
80(e.g., to store its size) then the total overhead of the
81memory manager could sum up to 25% of the overall used memory, which is
82an unnacceptable waste of memory.
83@item (4) the memory manager must have a clean API and it must support debugging
84The API of the memory manager must be similar to the standard API's of
85memory managers, otherwise its usability is greatly limited. Furthermore,
86it needs to provide error checking and debugging support (to detect
87overwriting, twice freed or not-freed memory blocks, etc.) since errors
88in the usage of the memory manager are otherwise very hard to find and
89correct.
90@item (5) the memory manager must be customizable, tunable, extensible and portable
91The memory manager should support customizations (like "what to do if no
92more memory is available"); its parameters (e.g., allocation policies)
93should be tunable, it should be extensible to allow easy
94implementations of furthergoing functionality (like overloading of C++
95constructors and destructors, etc), and it needs to be portable to all
96available operating systems.
97@end table
98
99To the best of our knowledge, there is currently no memory manager
100available which satisfies these (admittingly in part extreme)
101requirements.  Therefore, we desigend and implemented @code{omalloc}.
102
103
104@heading Principles of omalloc
105
106@code{omalloc} is a stand-alone, general-purpose memory managment
107library which can be used by any program. Its major design goals were
108derived from the memory management requirements of @sc{Singular} (see above).
109
110@code{omalloc} does neither provide nor support garbage collection -- it
111is an extension to, and sits on top of, the standard
112@code{malloc/realloc/free} routines provided by most operating systems.
113
114@code{omalloc} is specialized in the management of small memory
115blocks. The managment of large memory blocks, i.e., of memory blocks which
116are larger than "one fourth of the page size of the system" (usually
117appr. 1KByte) are (almost) left entirely to the underlying
118@code{malloc/realloc/free} implementation.
119
120 
121@code{omalloc} manages small blocks of memory on a @strong{per-page
122basis}. That is, each used page is split up into a page-header and
123equally-sized memory blocks. The page-header has a size of 6 words
124(i.e., 24 Byte on a 32 Bit machine), and stores (among others) a pointer
125to the free-list and a counter of the used memory blocks of this page.
126
127Pages which manage equally-sized memory blocks are collected into
128so-called @strong{"Bins"}, which are (more or less) nothing but linked
129lists of pages.
130
131On @strong{memory allocation}, an appropriate page (i.e. one which has a
132non-empty free list of the appropriate block size) is determined based
133on the  used memory allocation mechanism and its arguments (see
134below). The counter of the page is incremented, and the provided memory
135block is dequed from the free-list of the page.
136
137On @strong{memory deallocation}, the address of the memory block is used to
138determine the page (header) which manages the memory block to be
139freed. The counter of the page is decremented, and the memory block is
140enqued into the free-list. If decrementing the counter yields zero,
141i.e., if there are no more used memory blocks in this page, then this
142page is dequed from its Bin and put back into a global pool of unused pages.
143
144
145@code{omalloc}' @strong{API} provides a three-level interface:
146@table @asis
147@item 1.) Standard function interface level
148This level provides the standard allocation/deallocation functions like
149@code{malloc, calloc, realloc, free}, etc. No special @code{#include}'s
150are required, only linking with @code{libomalloc} is necessary.
151
152@item 2.) Standard macro interface level
153This level provides variants of the standard allocation/deallocation
154functions and some extensions thereof in form of macros. To use this
155level, the header file @code{omalloc.h} needs to be included by an
156application, and it must be linked with @code{libomalloc}.
157
158@item 3.) Bin-based macro interface level
159This level provides allocation (resp. deallocation) macros whose
160arguments are not sizes (like those of the standard allocation
161functions/macros), but directly Bins. Naturally, to use this level, the
162header the file @code{omalloc.h} needs to be included by an application,
163and it must be linked with @code{libomalloc}.
164@end table
165
166@code{omalloc} has a special mode of operation, namely its
167@strong{debugging mode}. This mode can be activated for the functional
168interface by setting run-time parameters, and, for the macro-interface,
169by setting debug-flags on a per-file basis, before @code{#include}'ing
170the file @code{omalloc.h}.  Furthermore, the granularity of the
171debugging can dynamically be set at run-time using various
172parameters. One special feature of @code{omalloc} is that memory
173allocated with a debugging allocation routine can safely be deallocated
174with a non-debugging deallocation routine (and vice versa). In other
175words, the debugging and non-debugging mode of omalloc are not mutually
176exclusive. When in its debugging mode, @code{omalloc} is set out to
177catch more than 20 errors in the usage of its API. And, of course, all
178the common and classical errors are among those which are cought by
179@code{omalloc}. Error reporting can either be directe3d to log files or to
180@code{stderr}. On some platforms, the error reporting does not only
181include line number information, but also stack trace (i.e., a
182description of the routines on the function stack), information.
183
184
185@heading Requirement evaluation of @code{omalloc}
186After explaining its princples of @code{omalloc} let us evaluate
187@code{omalloc} w.r.t. the requirments stated above.
188
189@table @strong
190
191@item (1) efficiency of allocation/deallocation
192When used with its Bin-based macro interface level, the
193allocation/deallocation of small memory blocks requires (on average)
194only a couple (say, 5-10) assembler instructions. We believe
195that this is optimal, i.e., as fast as one  possibly could implement
196these operations.
197@*The other interface levels incure the additional overhead of the
198determination of the appropriate Bin (using a table lookup); and, for the
199function interface level, the additional overhead of function
200calls. Compared with existing memory managers, even these other interface
201levels provide unmatchingly efficient allocation/deallocation routines.
202
203@item (2) locality of reference
204Based on its design principles @code{omalloc} provides  an
205extremly high locality of reference (especially for consecutive elements
206of linked-list like structures). To realize this, recall that
207@itemize
208@item
209consecutively allocated memory blocks are physically almost always from
210the same memory page (the current memory page of a bin is only changed
211when it becomes full or empty).
212@item
213the probabilty that consecutively allocated memory blocks are also
214physically consecutive (i.e., that their physical addresses are consecutive)
215is rather high, since memory pages are spooled-back into the
216pool of unused pages as often as possible, and, hence, are reinitialized
217as often as possible (notice that allocationg from a freshly initialized
218page assures that the physical addresses of consecutively allocated
219memory blocks are also consecutive).
220@end itemize
221Moreover, the allocation policy of a Bin can be changed at
222run-time such that it is assured that consecutive allocations from this
223Bin result in physically consecutive memory blocks. However, the latter
224can only be achieved by paying for with a slightly higher memory
225consumption, and should therefore only be used temporarily for
226long-lived, often traversed-over data.
227
228@item (3) small maintainance size overhead
229
230Since @code{omalloc} manages small memory blocks on a per-page basis, there is
231no maintainance-overhead on a per-block basis, but only on a per-page
232basis. This overhead amounts to 24 Bytes per memory
233page, i.e., we only need 24 Bytes maintanance overhead per 4096 Bytes,
234or 0.6%.
235@*For larger memory blocks, this overhead grows since there might
236be "leftovers" from the division of a memory page into equally-sized
237chunks of memory blocks. Furthermore, there is the additional (but
238rather small) overhead of keeping Bins and other "global" maintance
239structures.
240@*Nevertheless, on average, we observed a maintanance overhead of
241appr. 1% of the total memory usage, which, to the best of our knowldege,
242is unsurpassed by any existing, general-purpose, memory manager.
243
244@item (4) clean API and debugging support
245Whether or not @code{omalloc} provides a clean API is certainly a matter
246of taste and expectation. However, it is intrinsic that the price one has to
247pay for the speed and flexibiity of a memory manager like @code{omalloc}
248is a more complex API than that of the standard
249@code{malloc/realloc/free} functions.  Nevertheless, as we believe and
250as our experience has shown, the above outlined layered interface
251designe provides a practical compromise between simplicity and
252efficienc/flexibility.
253
254W.r.t. the debugging mode has our experience shown that it is extremly
255useful to be able to fine-tune the granularity of the debugging under both,
256a lexical, per-file base scope, and under a dynamic scope set at
257run-time. It also turned out that it is very convenient that the
258debugging and non-debuggung mode are not mutually exclusive and that the
259same library (and, header file) can be used for both modes -- i.e., that
260no recompilations of @code{omalloc} are required to enable one or the other
261mode. As a summary @code{omalloc} debugging features can at least be
262compared with available memory managment debugging libraries (like
263@uref{http://dmalloc.com/,dmalloc}) -- our experience has furthermore
264showh that the features are suitable and appropriate for most practical
265purposes. But then again, fittnes is also a matter of taste and
266expectations and therefore hard to evaluate.
267
268@item (5) customizable, tunable, extensible and portable
269
270@end table
271
272@heading Performance evaluation of @code{omalloc}
273
274We have no (yet) thouroughly  measured the performance of
275@code{omalloc} in comparison to other memory managment
276systems. Nevertheless, our first experiments yielded the following:
277@itemize
278@item
279@sc{Singular}'s memory consumption is up to 30% lower and its speed is
280up to 100% faster when used with @code{omalloc} instead of  of a standard
281memory manager (like the one found in the @code{libc} library on Linux, or
282the one underlying @code{perl}).
283
284@item
285We expect that for programs/computations with similar compuational
286characteristics like those of @sc{Singular} (e.g., multivariate
287polynomial computations, sparse matrix computations, etc), the speed
288increase and memory consumption decrease is comparable to those reported
289for @sc{Singular}.
290
291@item
292For other applications (like @code{perl} or @code{gcc}) we observed a
293slight speed increase (depending on the input) and an overall decrease
294in the memory consumptions of up to 25%.
295@end itemize
296
297@heading Status/Copyright of @code{omalloc}
298
299@code{omalloc} is a fully-functional, mature memory managment
300library. It still has a TODO list, with more or less only one major
301point: Create a detailed technical reference manual (in texinfo),
302because there is none so far.
303
304@code{omalloc} was written by
305@uref{http://www.mathematik.uni-kl.de/~obachman,,Olaf Bachmann} who also
306holds its copyright.
307
308@code{omalloc} is @strong{not} free software. Its usage is restricted to
309@sc{Singular} and it does @strong{not} fall under @sc{Singular}'s
310copyright. If you are interested in using @code{omalloc}, please send
311email to @email{obachman@@mathematik.uni-kl.de}. See also the file
312COPYING_OMALLOC for details.
313
314@bye
315 
Note: See TracBrowser for help on using the repository browser.