source: git/doc/memory/OMALLOC.texi @ 2e553a

spielwiese
Last change on this file since 2e553a was 6ce030f, checked in by Oleksandr Motsak <motsak@…>, 12 years ago
removal of the $Id$ svn tag from everywhere NOTE: the git SHA1 may be used instead (only on special places) NOTE: the libraries Singular/LIB/*.lib still contain the marker due to our current use of svn
  • Property mode set to 100644
File size: 14.7 KB
Line 
1\input texinfo    @c -*-texinfo-*-
2@setfilename omalloc.hlp
3
4@comment -*-texinfo-*-
5@comment Unix
6
7@majorheading An Overview of @code{omalloc}
8@cindex omalloc
9@cindex omalloc, overview
10
11This document gives a short overview of @code{omalloc}, the internal
12memory manager of @sc{Singular}.
13
14@heading Requirements of a memory manager for @sc{Singular}
15
16Most of @sc{Singular}'s computations boil down to primitive polynomial
17operations like copying, deleting, adding, and multiplying of
18polynomials. For example, standard bases computations over finite fields
19spent (on average) 90 per cent of their time realizing the operation @code{p - m*q} where @code{m} is a monomial, and @code{p,q} are polynomials.
20
21@sc{Singular} uses linked lists of monomials as data structure for
22representing polynomials. A monomial is represented by a memory block
23which stores a coefficient and an exponent vector. The size of this
24memory block varies: its minimum size is three machine words (one word
25for each, the pointer to the "next" monomial, the coefficient  and the
26exponent vector), it's maximal size is (almost) unlimited, and
27we can assume that its average size is 4 to 6 machine words (i.e., the
28exponent vector is 2 to 4 words long).
29
30From a low-level point of view, polynomial operations are operations on
31linked lists which go as follows:
32@enumerate
33@item
34Do something with the exponent vector (e.g., copy, compare, add), and
35possibly do something with the coefficient (e.g., add, multiply, compare
36with zero) of a monomial.
37@item
38Advance to the next monomial and/or allocate a new or free an existing
39monomial.
40@end enumerate
41
42Assuming that coefficient operations are (inlined) operations on a
43single machine word (as they are, for example, for coefficients from
44most finite fields), and observing that the exponent vector operations
45are linear w.r.t. the length of the exponent vector, we come to the
46following major conclusion:
47
48@quotation
49@emph{For most computations, a traversal to the next memory block or a
50allocation/deallocation of a memory block is, on average, required after
51every 4 to 6  machine word operations.}
52@end quotation
53
54The major requirements of a memory manager for @sc{Singular} become
55immediately clear from this conclusion:
56
57@table @strong
58@item (1) allocation/deallocation of (small) memory blocks must be extremly fast
59If the allocation/deallocation of a memory blocks requires more than a
60couple of machine instructions it would become the bottleneck of the
61computations. In particular, even the overhead of a function call
62for a memory operation is already too expensive.
63@item (2) consecutive memory blocks in linked lists must have a high locality of reference
64Compared with the performed operations on list elements, cache misses
65(or, even worse, page misses) of memory blocks are extremly expensive:
66we estimate that one cache miss of a memory block costs at least ten
67or more list element operations. Hence, if the addresses of consecutive
68list elements are not physically closed to each other (i.e., if
69the linked lists have a low locality of reference), then resolving
70chache (or, page) misses would become the bottleneck of computations.
71@end table
72
73Furthermore, there a the following more or less obvious requirements on
74a memory manager for @sc{Singular}:
75
76@table @strong
77@item (3) the size overhead to maintain small blocks of memory must be small
78If managing one memory block requires one or more words
79(e.g., to store its size) then the total overhead of the
80memory manager could sum up to 25% of the overall used memory, which is
81an unnacceptable waste of memory.
82@item (4) the memory manager must have a clean API and it must support debugging
83The API of the memory manager must be similar to the standard API's of
84memory managers, otherwise its usability is greatly limited. Furthermore,
85it needs to provide error checking and debugging support (to detect
86overwriting, twice freed or not-freed memory blocks, etc.) since errors
87in the usage of the memory manager are otherwise very hard to find and
88correct.
89@item (5) the memory manager must be customizable, tunable, extensible and portable
90The memory manager should support customizations (like "what to do if no
91more memory is available"); its parameters (e.g., allocation policies)
92should be tunable, it should be extensible to allow easy
93implementations of furthergoing functionality (like overloading of C++
94constructors and destructors, etc), and it needs to be portable to all
95available operating systems.
96@end table
97
98To the best of our knowledge, there is currently no memory manager
99available which satisfies these (admittingly in part extreme)
100requirements.  Therefore, we desigend and implemented @code{omalloc}.
101
102
103@heading Principles of omalloc
104
105@code{omalloc} is a stand-alone, general-purpose memory managment
106library which can be used by any program. Its major design goals were
107derived from the memory management requirements of @sc{Singular} (see above).
108
109@code{omalloc} does neither provide nor support garbage collection -- it
110is an extension to, and sits on top of, the standard
111@code{malloc/realloc/free} routines provided by most operating systems.
112
113@code{omalloc} is specialized in the management of small memory
114blocks. The managment of large memory blocks, i.e., of memory blocks which
115are larger than "one fourth of the page size of the system" (usually
116appr. 1KByte) are (almost) left entirely to the underlying
117@code{malloc/realloc/free} implementation.
118
119 
120@code{omalloc} manages small blocks of memory on a @strong{per-page
121basis}. That is, each used page is split up into a page-header and
122equally-sized memory blocks. The page-header has a size of 6 words
123(i.e., 24 Byte on a 32 Bit machine), and stores (among others) a pointer
124to the free-list and a counter of the used memory blocks of this page.
125
126Pages which manage equally-sized memory blocks are collected into
127so-called @strong{"Bins"}, which are (more or less) nothing but linked
128lists of pages.
129
130On @strong{memory allocation}, an appropriate page (i.e. one which has a
131non-empty free list of the appropriate block size) is determined based
132on the  used memory allocation mechanism and its arguments (see
133below). The counter of the page is incremented, and the provided memory
134block is dequed from the free-list of the page.
135
136On @strong{memory deallocation}, the address of the memory block is used to
137determine the page (header) which manages the memory block to be
138freed. The counter of the page is decremented, and the memory block is
139enqued into the free-list. If decrementing the counter yields zero,
140i.e., if there are no more used memory blocks in this page, then this
141page is dequed from its Bin and put back into a global pool of unused pages.
142
143
144@code{omalloc}' @strong{API} provides a three-level interface:
145@table @asis
146@item 1.) Standard function interface level
147This level provides the standard allocation/deallocation functions like
148@code{malloc, calloc, realloc, free}, etc. No special @code{#include}'s
149are required, only linking with @code{libomalloc} is necessary.
150
151@item 2.) Standard macro interface level
152This level provides variants of the standard allocation/deallocation
153functions and some extensions thereof in form of macros. To use this
154level, the header file @code{omalloc.h} needs to be included by an
155application, and it must be linked with @code{libomalloc}.
156
157@item 3.) Bin-based macro interface level
158This level provides allocation (resp. deallocation) macros whose
159arguments are not sizes (like those of the standard allocation
160functions/macros), but directly Bins. Naturally, to use this level, the
161header the file @code{omalloc.h} needs to be included by an application,
162and it must be linked with @code{libomalloc}.
163@end table
164
165@code{omalloc} has a special mode of operation, namely its
166@strong{debugging mode}. This mode can be activated for the functional
167interface by setting run-time parameters, and, for the macro-interface,
168by setting debug-flags on a per-file basis, before @code{#include}'ing
169the file @code{omalloc.h}.  Furthermore, the granularity of the
170debugging can dynamically be set at run-time using various
171parameters. One special feature of @code{omalloc} is that memory
172allocated with a debugging allocation routine can safely be deallocated
173with a non-debugging deallocation routine (and vice versa). In other
174words, the debugging and non-debugging mode of omalloc are not mutually
175exclusive. When in its debugging mode, @code{omalloc} is set out to
176catch more than 20 errors in the usage of its API. And, of course, all
177the common and classical errors are among those which are cought by
178@code{omalloc}. Error reporting can either be directe3d to log files or to
179@code{stderr}. On some platforms, the error reporting does not only
180include line number information, but also stack trace (i.e., a
181description of the routines on the function stack), information.
182
183
184@heading Requirement evaluation of @code{omalloc}
185After explaining its princples of @code{omalloc} let us evaluate
186@code{omalloc} w.r.t. the requirments stated above.
187
188@table @strong
189
190@item (1) efficiency of allocation/deallocation
191When used with its Bin-based macro interface level, the
192allocation/deallocation of small memory blocks requires (on average)
193only a couple (say, 5-10) assembler instructions. We believe
194that this is optimal, i.e., as fast as one  possibly could implement
195these operations.
196@*The other interface levels incure the additional overhead of the
197determination of the appropriate Bin (using a table lookup); and, for the
198function interface level, the additional overhead of function
199calls. Compared with existing memory managers, even these other interface
200levels provide unmatchingly efficient allocation/deallocation routines.
201
202@item (2) locality of reference
203Based on its design principles @code{omalloc} provides  an
204extremly high locality of reference (especially for consecutive elements
205of linked-list like structures). To realize this, recall that
206@itemize
207@item
208consecutively allocated memory blocks are physically almost always from
209the same memory page (the current memory page of a bin is only changed
210when it becomes full or empty).
211@item
212the probabilty that consecutively allocated memory blocks are also
213physically consecutive (i.e., that their physical addresses are consecutive)
214is rather high, since memory pages are spooled-back into the
215pool of unused pages as often as possible, and, hence, are reinitialized
216as often as possible (notice that allocationg from a freshly initialized
217page assures that the physical addresses of consecutively allocated
218memory blocks are also consecutive).
219@end itemize
220Moreover, the allocation policy of a Bin can be changed at
221run-time such that it is assured that consecutive allocations from this
222Bin result in physically consecutive memory blocks. However, the latter
223can only be achieved by paying for with a slightly higher memory
224consumption, and should therefore only be used temporarily for
225long-lived, often traversed-over data.
226
227@item (3) small maintainance size overhead
228
229Since @code{omalloc} manages small memory blocks on a per-page basis, there is
230no maintainance-overhead on a per-block basis, but only on a per-page
231basis. This overhead amounts to 24 Bytes per memory
232page, i.e., we only need 24 Bytes maintanance overhead per 4096 Bytes,
233or 0.6%.
234@*For larger memory blocks, this overhead grows since there might
235be "leftovers" from the division of a memory page into equally-sized
236chunks of memory blocks. Furthermore, there is the additional (but
237rather small) overhead of keeping Bins and other "global" maintance
238structures.
239@*Nevertheless, on average, we observed a maintanance overhead of
240appr. 1% of the total memory usage, which, to the best of our knowldege,
241is unsurpassed by any existing, general-purpose, memory manager.
242
243@item (4) clean API and debugging support
244Whether or not @code{omalloc} provides a clean API is certainly a matter
245of taste and expectation. However, it is intrinsic that the price one has to
246pay for the speed and flexibiity of a memory manager like @code{omalloc}
247is a more complex API than that of the standard
248@code{malloc/realloc/free} functions.  Nevertheless, as we believe and
249as our experience has shown, the above outlined layered interface
250designe provides a practical compromise between simplicity and
251efficienc/flexibility.
252
253W.r.t. the debugging mode has our experience shown that it is extremly
254useful to be able to fine-tune the granularity of the debugging under both,
255a lexical, per-file base scope, and under a dynamic scope set at
256run-time. It also turned out that it is very convenient that the
257debugging and non-debuggung mode are not mutually exclusive and that the
258same library (and, header file) can be used for both modes -- i.e., that
259no recompilations of @code{omalloc} are required to enable one or the other
260mode. As a summary @code{omalloc} debugging features can at least be
261compared with available memory managment debugging libraries (like
262@uref{http://dmalloc.com/,dmalloc}) -- our experience has furthermore
263showh that the features are suitable and appropriate for most practical
264purposes. But then again, fittnes is also a matter of taste and
265expectations and therefore hard to evaluate.
266
267@item (5) customizable, tunable, extensible and portable
268
269@end table
270
271@heading Performance evaluation of @code{omalloc}
272
273We have no (yet) thouroughly  measured the performance of
274@code{omalloc} in comparison to other memory managment
275systems. Nevertheless, our first experiments yielded the following:
276@itemize
277@item
278@sc{Singular}'s memory consumption is up to 30% lower and its speed is
279up to 100% faster when used with @code{omalloc} instead of  of a standard
280memory manager (like the one found in the @code{libc} library on Linux, or
281the one underlying @code{perl}).
282
283@item
284We expect that for programs/computations with similar compuational
285characteristics like those of @sc{Singular} (e.g., multivariate
286polynomial computations, sparse matrix computations, etc), the speed
287increase and memory consumption decrease is comparable to those reported
288for @sc{Singular}.
289
290@item
291For other applications (like @code{perl} or @code{gcc}) we observed a
292slight speed increase (depending on the input) and an overall decrease
293in the memory consumptions of up to 25%.
294@end itemize
295
296@heading Status/Copyright of @code{omalloc}
297
298@code{omalloc} is a fully-functional, mature memory managment
299library. It still has a TODO list, with more or less only one major
300point: Create a detailed technical reference manual (in texinfo),
301because there is none so far.
302
303@code{omalloc} was written by
304@uref{http://www.mathematik.uni-kl.de/~obachman,,Olaf Bachmann} who also
305holds its copyright.
306
307@code{omalloc} is @strong{not} free software. Its usage is restricted to
308@sc{Singular} and it does @strong{not} fall under @sc{Singular}'s
309copyright. If you are interested in using @code{omalloc}, please send
310email to @email{obachman@@mathematik.uni-kl.de}. See also the file
311COPYING_OMALLOC for details.
312
313@bye
314 
Note: See TracBrowser for help on using the repository browser.