1 | \input texinfo @c -*-texinfo-*- |
---|
2 | @setfilename omalloc.hlp |
---|
3 | |
---|
4 | @comment -*-texinfo-*- |
---|
5 | @comment Unix |
---|
6 | |
---|
7 | @majorheading An Overview of @code{omalloc} |
---|
8 | @cindex omalloc |
---|
9 | @cindex omalloc, overview |
---|
10 | |
---|
11 | This document gives a short overview of @code{omalloc}, the internal |
---|
12 | memory manager of @sc{Singular}. |
---|
13 | |
---|
14 | @heading Requirements of a memory manager for @sc{Singular} |
---|
15 | |
---|
16 | Most of @sc{Singular}'s computations boil down to primitive polynomial |
---|
17 | operations like copying, deleting, adding, and multiplying of |
---|
18 | polynomials. For example, standard bases computations over finite fields |
---|
19 | spent (on average) 90 per cent of their time realizing the operation @code{p - m*q} where @code{m} is a monomial, and @code{p,q} are polynomials. |
---|
20 | |
---|
21 | @sc{Singular} uses linked lists of monomials as data structure for |
---|
22 | representing polynomials. A monomial is represented by a memory block |
---|
23 | which stores a coefficient and an exponent vector. The size of this |
---|
24 | memory block varies: its minimum size is three machine words (one word |
---|
25 | for each, the pointer to the "next" monomial, the coefficient and the |
---|
26 | exponent vector), it's maximal size is (almost) unlimited, and |
---|
27 | we can assume that its average size is 4 to 6 machine words (i.e., the |
---|
28 | exponent vector is 2 to 4 words long). |
---|
29 | |
---|
30 | From a low-level point of view, polynomial operations are operations on |
---|
31 | linked lists which go as follows: |
---|
32 | @enumerate |
---|
33 | @item |
---|
34 | Do something with the exponent vector (e.g., copy, compare, add), and |
---|
35 | possibly do something with the coefficient (e.g., add, multiply, compare |
---|
36 | with zero) of a monomial. |
---|
37 | @item |
---|
38 | Advance to the next monomial and/or allocate a new or free an existing |
---|
39 | monomial. |
---|
40 | @end enumerate |
---|
41 | |
---|
42 | Assuming that coefficient operations are (inlined) operations on a |
---|
43 | single machine word (as they are, for example, for coefficients from |
---|
44 | most finite fields), and observing that the exponent vector operations |
---|
45 | are linear w.r.t. the length of the exponent vector, we come to the |
---|
46 | following major conclusion: |
---|
47 | |
---|
48 | @quotation |
---|
49 | @emph{For most computations, a traversal to the next memory block or a |
---|
50 | allocation/deallocation of a memory block is, on average, required after |
---|
51 | every 4 to 6 machine word operations.} |
---|
52 | @end quotation |
---|
53 | |
---|
54 | The major requirements of a memory manager for @sc{Singular} become |
---|
55 | immediately clear from this conclusion: |
---|
56 | |
---|
57 | @table @strong |
---|
58 | @item (1) allocation/deallocation of (small) memory blocks must be extremely fast |
---|
59 | If the allocation/deallocation of a memory blocks requires more than a |
---|
60 | couple of machine instructions it would become the bottleneck of the |
---|
61 | computations. In particular, even the overhead of a function call |
---|
62 | for a memory operation is already too expensive. |
---|
63 | @item (2) consecutive memory blocks in linked lists must have a high locality of reference |
---|
64 | Compared with the performed operations on list elements, cache misses |
---|
65 | (or, even worse, page misses) of memory blocks are extremely expensive: |
---|
66 | we estimate that one cache miss of a memory block costs at least ten |
---|
67 | or more list element operations. Hence, if the addresses of consecutive |
---|
68 | list elements are not physically closed to each other (i.e., if |
---|
69 | the linked lists have a low locality of reference), then resolving |
---|
70 | cache (or, page) misses would become the bottleneck of computations. |
---|
71 | @end table |
---|
72 | |
---|
73 | Furthermore, there a the following more or less obvious requirements on |
---|
74 | a memory manager for @sc{Singular}: |
---|
75 | |
---|
76 | @table @strong |
---|
77 | @item (3) the size overhead to maintain small blocks of memory must be small |
---|
78 | If managing one memory block requires one or more words |
---|
79 | (e.g., to store its size) then the total overhead of the |
---|
80 | memory manager could sum up to 25% of the overall used memory, which is |
---|
81 | an unnacceptable waste of memory. |
---|
82 | @item (4) the memory manager must have a clean API and it must support debugging |
---|
83 | The API of the memory manager must be similar to the standard API's of |
---|
84 | memory managers, otherwise its usability is greatly limited. Furthermore, |
---|
85 | it needs to provide error checking and debugging support (to detect |
---|
86 | overwriting, twice freed or not-freed memory blocks, etc.) since errors |
---|
87 | in the usage of the memory manager are otherwise very hard to find and |
---|
88 | correct. |
---|
89 | @item (5) the memory manager must be customizable, tunable, extensible and portable |
---|
90 | The memory manager should support customizations (like "what to do if no |
---|
91 | more memory is available"); its parameters (e.g., allocation policies) |
---|
92 | should be tunable, it should be extensible to allow easy |
---|
93 | implementations of furthergoing functionality (like overloading of C++ |
---|
94 | constructors and destructors, etc), and it needs to be portable to all |
---|
95 | available operating systems. |
---|
96 | @end table |
---|
97 | |
---|
98 | To the best of our knowledge, there is currently no memory manager |
---|
99 | available which satisfies these (admittingly in part extreme) |
---|
100 | requirements. Therefore, we desigend and implemented @code{omalloc}. |
---|
101 | |
---|
102 | |
---|
103 | @heading Principles of omalloc |
---|
104 | |
---|
105 | @code{omalloc} is a stand-alone, general-purpose memory management |
---|
106 | library which can be used by any program. Its major design goals were |
---|
107 | derived from the memory management requirements of @sc{Singular} (see above). |
---|
108 | |
---|
109 | @code{omalloc} does neither provide nor support garbage collection -- it |
---|
110 | is an extension to, and sits on top of, the standard |
---|
111 | @code{malloc/realloc/free} routines provided by most operating systems. |
---|
112 | |
---|
113 | @code{omalloc} is specialized in the management of small memory |
---|
114 | blocks. The management of large memory blocks, i.e., of memory blocks which |
---|
115 | are larger than "one fourth of the page size of the system" (usually |
---|
116 | appr. 1KByte) are (almost) left entirely to the underlying |
---|
117 | @code{malloc/realloc/free} implementation. |
---|
118 | |
---|
119 | |
---|
120 | @code{omalloc} manages small blocks of memory on a @strong{per-page |
---|
121 | basis}. That is, each used page is split up into a page-header and |
---|
122 | equally-sized memory blocks. The page-header has a size of 6 words |
---|
123 | (i.e., 24 Byte on a 32 Bit machine), and stores (among others) a pointer |
---|
124 | to the free-list and a counter of the used memory blocks of this page. |
---|
125 | |
---|
126 | Pages which manage equally-sized memory blocks are collected into |
---|
127 | so-called @strong{"Bins"}, which are (more or less) nothing but linked |
---|
128 | lists of pages. |
---|
129 | |
---|
130 | On @strong{memory allocation}, an appropriate page (i.e. one which has a |
---|
131 | non-empty free list of the appropriate block size) is determined based |
---|
132 | on the used memory allocation mechanism and its arguments (see |
---|
133 | below). The counter of the page is incremented, and the provided memory |
---|
134 | block is dequeued from the free-list of the page. |
---|
135 | |
---|
136 | On @strong{memory deallocation}, the address of the memory block is used to |
---|
137 | determine the page (header) which manages the memory block to be |
---|
138 | freed. The counter of the page is decremented, and the memory block is |
---|
139 | enqued into the free-list. If decrementing the counter yields zero, |
---|
140 | i.e., if there are no more used memory blocks in this page, then this |
---|
141 | page is dequeued from its Bin and put back into a global pool of unused pages. |
---|
142 | |
---|
143 | |
---|
144 | @code{omalloc}' @strong{API} provides a three-level interface: |
---|
145 | @table @asis |
---|
146 | @item 1.) Standard function interface level |
---|
147 | This level provides the standard allocation/deallocation functions like |
---|
148 | @code{malloc, calloc, realloc, free}, etc. No special @code{#include}'s |
---|
149 | are required, only linking with @code{libomalloc} is necessary. |
---|
150 | |
---|
151 | @item 2.) Standard macro interface level |
---|
152 | This level provides variants of the standard allocation/deallocation |
---|
153 | functions and some extensions thereof in form of macros. To use this |
---|
154 | level, the header file @code{omalloc.h} needs to be included by an |
---|
155 | application, and it must be linked with @code{libomalloc}. |
---|
156 | |
---|
157 | @item 3.) Bin-based macro interface level |
---|
158 | This level provides allocation (resp. deallocation) macros whose |
---|
159 | arguments are not sizes (like those of the standard allocation |
---|
160 | functions/macros), but directly Bins. Naturally, to use this level, the |
---|
161 | header the file @code{omalloc.h} needs to be included by an application, |
---|
162 | and it must be linked with @code{libomalloc}. |
---|
163 | @end table |
---|
164 | |
---|
165 | @code{omalloc} has a special mode of operation, namely its |
---|
166 | @strong{debugging mode}. This mode can be activated for the functional |
---|
167 | interface by setting run-time parameters, and, for the macro-interface, |
---|
168 | by setting debug-flags on a per-file basis, before @code{#include}'ing |
---|
169 | the file @code{omalloc.h}. Furthermore, the granularity of the |
---|
170 | debugging can dynamically be set at run-time using various |
---|
171 | parameters. One special feature of @code{omalloc} is that memory |
---|
172 | allocated with a debugging allocation routine can safely be deallocated |
---|
173 | with a non-debugging deallocation routine (and vice versa). In other |
---|
174 | words, the debugging and non-debugging mode of omalloc are not mutually |
---|
175 | exclusive. When in its debugging mode, @code{omalloc} is set out to |
---|
176 | catch more than 20 errors in the usage of its API. And, of course, all |
---|
177 | the common and classical errors are among those which are caught by |
---|
178 | @code{omalloc}. Error reporting can either be directe3d to log files or to |
---|
179 | @code{stderr}. On some platforms, the error reporting does not only |
---|
180 | include line number information, but also stack trace (i.e., a |
---|
181 | description of the routines on the function stack), information. |
---|
182 | |
---|
183 | |
---|
184 | @heading Requirement evaluation of @code{omalloc} |
---|
185 | After explaining its princples of @code{omalloc} let us evaluate |
---|
186 | @code{omalloc} w.r.t. the requirements stated above. |
---|
187 | |
---|
188 | @table @strong |
---|
189 | |
---|
190 | @item (1) efficiency of allocation/deallocation |
---|
191 | When used with its Bin-based macro interface level, the |
---|
192 | allocation/deallocation of small memory blocks requires (on average) |
---|
193 | only a couple (say, 5-10) assembler instructions. We believe |
---|
194 | that this is optimal, i.e., as fast as one possibly could implement |
---|
195 | these operations. |
---|
196 | @*The other interface levels incur the additional overhead of the |
---|
197 | determination of the appropriate Bin (using a table lookup); and, for the |
---|
198 | function interface level, the additional overhead of function |
---|
199 | calls. Compared with existing memory managers, even these other interface |
---|
200 | levels provide unmatchingly efficient allocation/deallocation routines. |
---|
201 | |
---|
202 | @item (2) locality of reference |
---|
203 | Based on its design principles @code{omalloc} provides an |
---|
204 | extremely high locality of reference (especially for consecutive elements |
---|
205 | of linked-list like structures). To realize this, recall that |
---|
206 | @itemize |
---|
207 | @item |
---|
208 | consecutively allocated memory blocks are physically almost always from |
---|
209 | the same memory page (the current memory page of a bin is only changed |
---|
210 | when it becomes full or empty). |
---|
211 | @item |
---|
212 | the probability that consecutively allocated memory blocks are also |
---|
213 | physically consecutive (i.e., that their physical addresses are consecutive) |
---|
214 | is rather high, since memory pages are spooled-back into the |
---|
215 | pool of unused pages as often as possible, and, hence, are reinitialized |
---|
216 | as often as possible (notice that allocating from a freshly initialized |
---|
217 | page assures that the physical addresses of consecutively allocated |
---|
218 | memory blocks are also consecutive). |
---|
219 | @end itemize |
---|
220 | Moreover, the allocation policy of a Bin can be changed at |
---|
221 | run-time such that it is assured that consecutive allocations from this |
---|
222 | Bin result in physically consecutive memory blocks. However, the latter |
---|
223 | can only be achieved by paying for with a slightly higher memory |
---|
224 | consumption, and should therefore only be used temporarily for |
---|
225 | long-lived, often traversed-over data. |
---|
226 | |
---|
227 | @item (3) small maintenance size overhead |
---|
228 | |
---|
229 | Since @code{omalloc} manages small memory blocks on a per-page basis, there is |
---|
230 | no maintenance-overhead on a per-block basis, but only on a per-page |
---|
231 | basis. This overhead amounts to 24 Bytes per memory |
---|
232 | page, i.e., we only need 24 Bytes maintenance overhead per 4096 Bytes, |
---|
233 | or 0.6%. |
---|
234 | @*For larger memory blocks, this overhead grows since there might |
---|
235 | be "leftovers" from the division of a memory page into equally-sized |
---|
236 | chunks of memory blocks. Furthermore, there is the additional (but |
---|
237 | rather small) overhead of keeping Bins and other "global" maintenance |
---|
238 | structures. |
---|
239 | @*Nevertheless, on average, we observed a maintenance overhead of |
---|
240 | appr. 1% of the total memory usage, which, to the best of our knowldege, |
---|
241 | is unsurpassed by any existing, general-purpose, memory manager. |
---|
242 | |
---|
243 | @item (4) clean API and debugging support |
---|
244 | Whether or not @code{omalloc} provides a clean API is certainly a matter |
---|
245 | of taste and expectation. However, it is intrinsic that the price one has to |
---|
246 | pay for the speed and flexibiity of a memory manager like @code{omalloc} |
---|
247 | is a more complex API than that of the standard |
---|
248 | @code{malloc/realloc/free} functions. Nevertheless, as we believe and |
---|
249 | as our experience has shown, the above outlined layered interface |
---|
250 | designe provides a practical compromise between simplicity and |
---|
251 | efficienc/flexibility. |
---|
252 | |
---|
253 | W.r.t. the debugging mode has our experience shown that it is extremely |
---|
254 | useful to be able to fine-tune the granularity of the debugging under both, |
---|
255 | a lexical, per-file base scope, and under a dynamic scope set at |
---|
256 | run-time. It also turned out that it is very convenient that the |
---|
257 | debugging and non-debuggung mode are not mutually exclusive and that the |
---|
258 | same library (and, header file) can be used for both modes -- i.e., that |
---|
259 | no recompilations of @code{omalloc} are required to enable one or the other |
---|
260 | mode. As a summary @code{omalloc} debugging features can at least be |
---|
261 | compared with available memory management debugging libraries (like |
---|
262 | @uref{http://dmalloc.com/,dmalloc}) -- our experience has furthermore |
---|
263 | showh that the features are suitable and appropriate for most practical |
---|
264 | purposes. But then again, fittnes is also a matter of taste and |
---|
265 | expectations and therefore hard to evaluate. |
---|
266 | |
---|
267 | @item (5) customizable, tunable, extensible and portable |
---|
268 | |
---|
269 | @end table |
---|
270 | |
---|
271 | @heading Performance evaluation of @code{omalloc} |
---|
272 | |
---|
273 | We have no (yet) thoroughly measured the performance of |
---|
274 | @code{omalloc} in comparison to other memory management |
---|
275 | systems. Nevertheless, our first experiments yielded the following: |
---|
276 | @itemize |
---|
277 | @item |
---|
278 | @sc{Singular}'s memory consumption is up to 30% lower and its speed is |
---|
279 | up to 100% faster when used with @code{omalloc} instead of of a standard |
---|
280 | memory manager (like the one found in the @code{libc} library on Linux, or |
---|
281 | the one underlying @code{perl}). |
---|
282 | |
---|
283 | @item |
---|
284 | We expect that for programs/computations with similar computational |
---|
285 | characteristics like those of @sc{Singular} (e.g., multivariate |
---|
286 | polynomial computations, sparse matrix computations, etc), the speed |
---|
287 | increase and memory consumption decrease is comparable to those reported |
---|
288 | for @sc{Singular}. |
---|
289 | |
---|
290 | @item |
---|
291 | For other applications (like @code{perl} or @code{gcc}) we observed a |
---|
292 | slight speed increase (depending on the input) and an overall decrease |
---|
293 | in the memory consumptions of up to 25%. |
---|
294 | @end itemize |
---|
295 | |
---|
296 | @heading Status/Copyright of @code{omalloc} |
---|
297 | |
---|
298 | @code{omalloc} is a fully-functional, mature memory management |
---|
299 | library. It still has a TODO list, with more or less only one major |
---|
300 | point: Create a detailed technical reference manual (in texinfo), |
---|
301 | because there is none so far. |
---|
302 | |
---|
303 | @code{omalloc} was written by |
---|
304 | @uref{http://www.mathematik.uni-kl.de/~obachman,,Olaf Bachmann} who also |
---|
305 | holds its copyright. |
---|
306 | |
---|
307 | @code{omalloc} is @strong{not} free software. Its usage is restricted to |
---|
308 | @sc{Singular} and it does @strong{not} fall under @sc{Singular}'s |
---|
309 | copyright. If you are interested in using @code{omalloc}, please send |
---|
310 | email to @email{obachman@@mathematik.uni-kl.de}. See also the file |
---|
311 | COPYING_OMALLOC for details. |
---|
312 | |
---|
313 | @bye |
---|
314 | |
---|