[780081] | 1 | \input texinfo @c -*-texinfo-*- |
---|
| 2 | @setfilename omalloc.hlp |
---|
| 3 | |
---|
| 4 | @comment -*-texinfo-*- |
---|
| 5 | @comment $Id$ |
---|
| 6 | @comment Unix |
---|
| 7 | |
---|
| 8 | @majorheading An Overview of @code{omalloc} |
---|
| 9 | @cindex omalloc |
---|
| 10 | @cindex omalloc, overview |
---|
| 11 | |
---|
| 12 | This document gives a short overview of @code{omalloc}, the internal |
---|
| 13 | memory manager of @sc{Singular}. |
---|
| 14 | |
---|
| 15 | @heading Requirements of a memory manager for @sc{Singular} |
---|
| 16 | |
---|
| 17 | Most of @sc{Singular}'s computations boil down to primitive polynomial |
---|
| 18 | operations like copying, deleting, adding, and multiplying of |
---|
| 19 | polynomials. For example, standard bases computations over finite fields |
---|
| 20 | spent (on average) 90 per cent of their time realizing the operation @code{p - m*q} where @code{m} is a monomial, and @code{p,q} are polynomials. |
---|
| 21 | |
---|
| 22 | @sc{Singular} uses linked lists of monomials as data structure for |
---|
| 23 | representing polynomials. A monomial is represented by a memory block |
---|
| 24 | which stores a coefficient and an exponent vector. The size of this |
---|
| 25 | memory block varies: its minimum size is three machine words (one word |
---|
| 26 | for each, the pointer to the "next" monomial, the coefficient and the |
---|
| 27 | exponent vector), it's maximal size is (almost) unlimited, and |
---|
| 28 | we can assume that its average size is 4 to 6 machine words (i.e., the |
---|
| 29 | exponent vector is 2 to 4 words long). |
---|
| 30 | |
---|
| 31 | From a low-level point of view, polynomial operations are operations on |
---|
| 32 | linked lists which go as follows: |
---|
| 33 | @enumerate |
---|
| 34 | @item |
---|
| 35 | Do something with the exponent vector (e.g., copy, compare, add), and |
---|
| 36 | possibly do something with the coefficient (e.g., add, multiply, compare |
---|
| 37 | with zero) of a monomial. |
---|
| 38 | @item |
---|
| 39 | Advance to the next monomial and/or allocate a new or free an existing |
---|
| 40 | monomial. |
---|
| 41 | @end enumerate |
---|
| 42 | |
---|
| 43 | Assuming that coefficient operations are (inlined) operations on a |
---|
| 44 | single machine word (as they are, for example, for coefficients from |
---|
| 45 | most finite fields), and observing that the exponent vector operations |
---|
| 46 | are linear w.r.t. the length of the exponent vector, we come to the |
---|
| 47 | following major conclusion: |
---|
| 48 | |
---|
| 49 | @quotation |
---|
| 50 | @emph{For most computations, a traversal to the next memory block or a |
---|
| 51 | allocation/deallocation of a memory block is, on average, required after |
---|
| 52 | every 4 to 6 machine word operations.} |
---|
| 53 | @end quotation |
---|
| 54 | |
---|
| 55 | The major requirements of a memory manager for @sc{Singular} become |
---|
| 56 | immediately clear from this conclusion: |
---|
| 57 | |
---|
| 58 | @table @strong |
---|
| 59 | @item (1) allocation/deallocation of (small) memory blocks must be extremly fast |
---|
| 60 | If the allocation/deallocation of a memory blocks requires more than a |
---|
| 61 | couple of machine instructions it would become the bottleneck of the |
---|
| 62 | computations. In particular, even the overhead of a function call |
---|
| 63 | for a memory operation is already too expensive. |
---|
| 64 | @item (2) consecutive memory blocks in linked lists must have a high locality of reference |
---|
| 65 | Compared with the performed operations on list elements, cache misses |
---|
| 66 | (or, even worse, page misses) of memory blocks are extremly expensive: |
---|
| 67 | we estimate that one cache miss of a memory block costs at least ten |
---|
| 68 | or more list element operations. Hence, if the addresses of consecutive |
---|
| 69 | list elements are not physically closed to each other (i.e., if |
---|
| 70 | the linked lists have a low locality of reference), then resolving |
---|
| 71 | chache (or, page) misses would become the bottleneck of computations. |
---|
| 72 | @end table |
---|
| 73 | |
---|
| 74 | Furthermore, there a the following more or less obvious requirements on |
---|
| 75 | a memory manager for @sc{Singular}: |
---|
| 76 | |
---|
| 77 | @table @strong |
---|
| 78 | @item (3) the size overhead to maintain small blocks of memory must be small |
---|
| 79 | If managing one memory block requires one or more words |
---|
| 80 | (e.g., to store its size) then the total overhead of the |
---|
| 81 | memory manager could sum up to 25% of the overall used memory, which is |
---|
| 82 | an unnacceptable waste of memory. |
---|
| 83 | @item (4) the memory manager must have a clean API and it must support debugging |
---|
| 84 | The API of the memory manager must be similar to the standard API's of |
---|
| 85 | memory managers, otherwise its usability is greatly limited. Furthermore, |
---|
| 86 | it needs to provide error checking and debugging support (to detect |
---|
| 87 | overwriting, twice freed or not-freed memory blocks, etc.) since errors |
---|
| 88 | in the usage of the memory manager are otherwise very hard to find and |
---|
| 89 | correct. |
---|
| 90 | @item (5) the memory manager must be customizable, tunable, extensible and portable |
---|
| 91 | The memory manager should support customizations (like "what to do if no |
---|
| 92 | more memory is available"); its parameters (e.g., allocation policies) |
---|
| 93 | should be tunable, it should be extensible to allow easy |
---|
| 94 | implementations of furthergoing functionality (like overloading of C++ |
---|
| 95 | constructors and destructors, etc), and it needs to be portable to all |
---|
| 96 | available operating systems. |
---|
| 97 | @end table |
---|
| 98 | |
---|
| 99 | To the best of our knowledge, there is currently no memory manager |
---|
| 100 | available which satisfies these (admittingly in part extreme) |
---|
| 101 | requirements. Therefore, we desigend and implemented @code{omalloc}. |
---|
| 102 | |
---|
| 103 | |
---|
| 104 | @heading Principles of omalloc |
---|
| 105 | |
---|
| 106 | @code{omalloc} is a stand-alone, general-purpose memory managment |
---|
| 107 | library which can be used by any program. Its major design goals were |
---|
| 108 | derived from the memory management requirements of @sc{Singular} (see above). |
---|
| 109 | |
---|
| 110 | @code{omalloc} does neither provide nor support garbage collection -- it |
---|
| 111 | is an extension to, and sits on top of, the standard |
---|
| 112 | @code{malloc/realloc/free} routines provided by most operating systems. |
---|
| 113 | |
---|
| 114 | @code{omalloc} is specialized in the management of small memory |
---|
| 115 | blocks. The managment of large memory blocks, i.e., of memory blocks which |
---|
| 116 | are larger than "one fourth of the page size of the system" (usually |
---|
| 117 | appr. 1KByte) are (almost) left entirely to the underlying |
---|
| 118 | @code{malloc/realloc/free} implementation. |
---|
| 119 | |
---|
| 120 | |
---|
| 121 | @code{omalloc} manages small blocks of memory on a @strong{per-page |
---|
| 122 | basis}. That is, each used page is split up into a page-header and |
---|
| 123 | equally-sized memory blocks. The page-header has a size of 6 words |
---|
| 124 | (i.e., 24 Byte on a 32 Bit machine), and stores (among others) a pointer |
---|
| 125 | to the free-list and a counter of the used memory blocks of this page. |
---|
| 126 | |
---|
| 127 | Pages which manage equally-sized memory blocks are collected into |
---|
| 128 | so-called @strong{"Bins"}, which are (more or less) nothing but linked |
---|
| 129 | lists of pages. |
---|
| 130 | |
---|
| 131 | On @strong{memory allocation}, an appropriate page (i.e. one which has a |
---|
| 132 | non-empty free list of the appropriate block size) is determined based |
---|
| 133 | on the used memory allocation mechanism and its arguments (see |
---|
| 134 | below). The counter of the page is incremented, and the provided memory |
---|
| 135 | block is dequed from the free-list of the page. |
---|
| 136 | |
---|
| 137 | On @strong{memory deallocation}, the address of the memory block is used to |
---|
| 138 | determine the page (header) which manages the memory block to be |
---|
| 139 | freed. The counter of the page is decremented, and the memory block is |
---|
| 140 | enqued into the free-list. If decrementing the counter yields zero, |
---|
| 141 | i.e., if there are no more used memory blocks in this page, then this |
---|
| 142 | page is dequed from its Bin and put back into a global pool of unused pages. |
---|
| 143 | |
---|
| 144 | |
---|
| 145 | @code{omalloc}' @strong{API} provides a three-level interface: |
---|
| 146 | @table @asis |
---|
| 147 | @item 1.) Standard function interface level |
---|
| 148 | This level provides the standard allocation/deallocation functions like |
---|
| 149 | @code{malloc, calloc, realloc, free}, etc. No special @code{#include}'s |
---|
| 150 | are required, only linking with @code{libomalloc} is necessary. |
---|
| 151 | |
---|
| 152 | @item 2.) Standard macro interface level |
---|
| 153 | This level provides variants of the standard allocation/deallocation |
---|
| 154 | functions and some extensions thereof in form of macros. To use this |
---|
| 155 | level, the header file @code{omalloc.h} needs to be included by an |
---|
| 156 | application, and it must be linked with @code{libomalloc}. |
---|
| 157 | |
---|
| 158 | @item 3.) Bin-based macro interface level |
---|
| 159 | This level provides allocation (resp. deallocation) macros whose |
---|
| 160 | arguments are not sizes (like those of the standard allocation |
---|
| 161 | functions/macros), but directly Bins. Naturally, to use this level, the |
---|
| 162 | header the file @code{omalloc.h} needs to be included by an application, |
---|
| 163 | and it must be linked with @code{libomalloc}. |
---|
| 164 | @end table |
---|
| 165 | |
---|
| 166 | @code{omalloc} has a special mode of operation, namely its |
---|
| 167 | @strong{debugging mode}. This mode can be activated for the functional |
---|
| 168 | interface by setting run-time parameters, and, for the macro-interface, |
---|
| 169 | by setting debug-flags on a per-file basis, before @code{#include}'ing |
---|
| 170 | the file @code{omalloc.h}. Furthermore, the granularity of the |
---|
| 171 | debugging can dynamically be set at run-time using various |
---|
| 172 | parameters. One special feature of @code{omalloc} is that memory |
---|
| 173 | allocated with a debugging allocation routine can safely be deallocated |
---|
| 174 | with a non-debugging deallocation routine (and vice versa). In other |
---|
| 175 | words, the debugging and non-debugging mode of omalloc are not mutually |
---|
| 176 | exclusive. When in its debugging mode, @code{omalloc} is set out to |
---|
| 177 | catch more than 20 errors in the usage of its API. And, of course, all |
---|
| 178 | the common and classical errors are among those which are cought by |
---|
| 179 | @code{omalloc}. Error reporting can either be directe3d to log files or to |
---|
| 180 | @code{stderr}. On some platforms, the error reporting does not only |
---|
| 181 | include line number information, but also stack trace (i.e., a |
---|
| 182 | description of the routines on the function stack), information. |
---|
| 183 | |
---|
| 184 | |
---|
| 185 | @heading Requirement evaluation of @code{omalloc} |
---|
| 186 | After explaining its princples of @code{omalloc} let us evaluate |
---|
| 187 | @code{omalloc} w.r.t. the requirments stated above. |
---|
| 188 | |
---|
| 189 | @table @strong |
---|
| 190 | |
---|
| 191 | @item (1) efficiency of allocation/deallocation |
---|
| 192 | When used with its Bin-based macro interface level, the |
---|
| 193 | allocation/deallocation of small memory blocks requires (on average) |
---|
| 194 | only a couple (say, 5-10) assembler instructions. We believe |
---|
| 195 | that this is optimal, i.e., as fast as one possibly could implement |
---|
| 196 | these operations. |
---|
| 197 | @*The other interface levels incure the additional overhead of the |
---|
| 198 | determination of the appropriate Bin (using a table lookup); and, for the |
---|
| 199 | function interface level, the additional overhead of function |
---|
| 200 | calls. Compared with existing memory managers, even these other interface |
---|
| 201 | levels provide unmatchingly efficient allocation/deallocation routines. |
---|
| 202 | |
---|
| 203 | @item (2) locality of reference |
---|
| 204 | Based on its design principles @code{omalloc} provides an |
---|
| 205 | extremly high locality of reference (especially for consecutive elements |
---|
| 206 | of linked-list like structures). To realize this, recall that |
---|
| 207 | @itemize |
---|
| 208 | @item |
---|
| 209 | consecutively allocated memory blocks are physically almost always from |
---|
| 210 | the same memory page (the current memory page of a bin is only changed |
---|
| 211 | when it becomes full or empty). |
---|
| 212 | @item |
---|
| 213 | the probabilty that consecutively allocated memory blocks are also |
---|
| 214 | physically consecutive (i.e., that their physical addresses are consecutive) |
---|
| 215 | is rather high, since memory pages are spooled-back into the |
---|
| 216 | pool of unused pages as often as possible, and, hence, are reinitialized |
---|
| 217 | as often as possible (notice that allocationg from a freshly initialized |
---|
| 218 | page assures that the physical addresses of consecutively allocated |
---|
| 219 | memory blocks are also consecutive). |
---|
| 220 | @end itemize |
---|
| 221 | Moreover, the allocation policy of a Bin can be changed at |
---|
| 222 | run-time such that it is assured that consecutive allocations from this |
---|
| 223 | Bin result in physically consecutive memory blocks. However, the latter |
---|
| 224 | can only be achieved by paying for with a slightly higher memory |
---|
| 225 | consumption, and should therefore only be used temporarily for |
---|
| 226 | long-lived, often traversed-over data. |
---|
| 227 | |
---|
| 228 | @item (3) small maintainance size overhead |
---|
| 229 | |
---|
| 230 | Since @code{omalloc} manages small memory blocks on a per-page basis, there is |
---|
| 231 | no maintainance-overhead on a per-block basis, but only on a per-page |
---|
| 232 | basis. This overhead amounts to 24 Bytes per memory |
---|
| 233 | page, i.e., we only need 24 Bytes maintanance overhead per 4096 Bytes, |
---|
| 234 | or 0.6%. |
---|
| 235 | @*For larger memory blocks, this overhead grows since there might |
---|
| 236 | be "leftovers" from the division of a memory page into equally-sized |
---|
| 237 | chunks of memory blocks. Furthermore, there is the additional (but |
---|
| 238 | rather small) overhead of keeping Bins and other "global" maintance |
---|
| 239 | structures. |
---|
| 240 | @*Nevertheless, on average, we observed a maintanance overhead of |
---|
| 241 | appr. 1% of the total memory usage, which, to the best of our knowldege, |
---|
| 242 | is unsurpassed by any existing, general-purpose, memory manager. |
---|
| 243 | |
---|
| 244 | @item (4) clean API and debugging support |
---|
| 245 | Whether or not @code{omalloc} provides a clean API is certainly a matter |
---|
| 246 | of taste and expectation. However, it is intrinsic that the price one has to |
---|
| 247 | pay for the speed and flexibiity of a memory manager like @code{omalloc} |
---|
| 248 | is a more complex API than that of the standard |
---|
| 249 | @code{malloc/realloc/free} functions. Nevertheless, as we believe and |
---|
| 250 | as our experience has shown, the above outlined layered interface |
---|
| 251 | designe provides a practical compromise between simplicity and |
---|
| 252 | efficienc/flexibility. |
---|
| 253 | |
---|
| 254 | W.r.t. the debugging mode has our experience shown that it is extremly |
---|
| 255 | useful to be able to fine-tune the granularity of the debugging under both, |
---|
| 256 | a lexical, per-file base scope, and under a dynamic scope set at |
---|
| 257 | run-time. It also turned out that it is very convenient that the |
---|
| 258 | debugging and non-debuggung mode are not mutually exclusive and that the |
---|
| 259 | same library (and, header file) can be used for both modes -- i.e., that |
---|
| 260 | no recompilations of @code{omalloc} are required to enable one or the other |
---|
| 261 | mode. As a summary @code{omalloc} debugging features can at least be |
---|
| 262 | compared with available memory managment debugging libraries (like |
---|
| 263 | @uref{http://dmalloc.com/,dmalloc}) -- our experience has furthermore |
---|
| 264 | showh that the features are suitable and appropriate for most practical |
---|
| 265 | purposes. But then again, fittnes is also a matter of taste and |
---|
| 266 | expectations and therefore hard to evaluate. |
---|
| 267 | |
---|
| 268 | @item (5) customizable, tunable, extensible and portable |
---|
| 269 | |
---|
| 270 | @end table |
---|
| 271 | |
---|
| 272 | @heading Performance evaluation of @code{omalloc} |
---|
| 273 | |
---|
| 274 | We have no (yet) thouroughly measured the performance of |
---|
| 275 | @code{omalloc} in comparison to other memory managment |
---|
| 276 | systems. Nevertheless, our first experiments yielded the following: |
---|
| 277 | @itemize |
---|
| 278 | @item |
---|
| 279 | @sc{Singular}'s memory consumption is up to 30% lower and its speed is |
---|
| 280 | up to 100% faster when used with @code{omalloc} instead of of a standard |
---|
| 281 | memory manager (like the one found in the @code{libc} library on Linux, or |
---|
| 282 | the one underlying @code{perl}). |
---|
| 283 | |
---|
| 284 | @item |
---|
| 285 | We expect that for programs/computations with similar compuational |
---|
| 286 | characteristics like those of @sc{Singular} (e.g., multivariate |
---|
| 287 | polynomial computations, sparse matrix computations, etc), the speed |
---|
| 288 | increase and memory consumption decrease is comparable to those reported |
---|
| 289 | for @sc{Singular}. |
---|
| 290 | |
---|
| 291 | @item |
---|
| 292 | For other applications (like @code{perl} or @code{gcc}) we observed a |
---|
| 293 | slight speed increase (depending on the input) and an overall decrease |
---|
| 294 | in the memory consumptions of up to 25%. |
---|
| 295 | @end itemize |
---|
| 296 | |
---|
| 297 | @heading Status/Copyright of @code{omalloc} |
---|
| 298 | |
---|
| 299 | @code{omalloc} is a fully-functional, mature memory managment |
---|
| 300 | library. It still has a TODO list, with more or less only one major |
---|
| 301 | point: Create a detailed technical reference manual (in texinfo), |
---|
| 302 | because there is none so far. |
---|
| 303 | |
---|
| 304 | @code{omalloc} was written by |
---|
| 305 | @uref{http://www.mathematik.uni-kl.de/~obachman,,Olaf Bachmann} who also |
---|
| 306 | holds its copyright. |
---|
| 307 | |
---|
| 308 | @code{omalloc} is @strong{not} free software. Its usage is restricted to |
---|
| 309 | @sc{Singular} and it does @strong{not} fall under @sc{Singular}'s |
---|
| 310 | copyright. If you are interested in using @code{omalloc}, please send |
---|
| 311 | email to @email{obachman@@mathematik.uni-kl.de}. See also the file |
---|
| 312 | COPYING_OMALLOC for details. |
---|
| 313 | |
---|
| 314 | @bye |
---|
| 315 | |
---|