1 | <html> |
---|
2 | <head> |
---|
3 | <title> |
---|
4 | A Tour of NTL: NTL Implementation and Portability </title> |
---|
5 | </head> |
---|
6 | |
---|
7 | <body bgcolor="#fff9e6"> |
---|
8 | <center> |
---|
9 | <a href="tour-tips.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a> |
---|
10 | <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> |
---|
11 | <a href="tour-gmp.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a> |
---|
12 | </center> |
---|
13 | |
---|
14 | <h1> |
---|
15 | <p align=center> |
---|
16 | A Tour of NTL: NTL Implementation and Portability |
---|
17 | </p> |
---|
18 | </h1> |
---|
19 | |
---|
20 | <p> <hr> <p> |
---|
21 | |
---|
22 | NTL is designed to be portable, fast, |
---|
23 | and relatively easy to use and extend. |
---|
24 | |
---|
25 | <p> |
---|
26 | To make NTL portable, no assembly code is used (well, almost none, see below). |
---|
27 | This is highly desirable, as architectures are constantly |
---|
28 | changing and evolving, and maintaining assembly |
---|
29 | code is quite costly. |
---|
30 | By avoiding assembly code, NTL should remain usable, |
---|
31 | with virtually no maintenance, for many years. |
---|
32 | |
---|
33 | <p> |
---|
34 | |
---|
35 | <h3>Minimal platform requirements</h3> |
---|
36 | |
---|
37 | When the configuration flags <tt>NTL_CLEAN_INT</tt> |
---|
38 | and <tt>NTL_CLEAN_PTR</tt> are both <i>on</i> (this is not the default, |
---|
39 | see below), |
---|
40 | NTL makes two requirements |
---|
41 | of its platform, |
---|
42 | neither of which are guaranteed by the <tt>C++</tt> language |
---|
43 | definition, but are essentially universal: |
---|
44 | |
---|
45 | <ol> |
---|
46 | <li> |
---|
47 | <tt>int</tt> and <tt>long</tt> quantities, respectively, |
---|
48 | are represented using a 2's complement |
---|
49 | representation whose width is equal to the width of <tt>unsigned int</tt> |
---|
50 | and <tt>unsigned long</tt>, respectively. |
---|
51 | <li> |
---|
52 | Double precision floating point |
---|
53 | conforms to the IEEE floating point standard. |
---|
54 | </ol> |
---|
55 | |
---|
56 | <p> |
---|
57 | NTl makes very conservative requirements of the <tt>C/C++</tt> compiler: |
---|
58 | <ul> |
---|
59 | <li> |
---|
60 | it is assumed that the <tt>C</tt> compiler conforms to the original |
---|
61 | ANSI <tt>C</tt> standard, |
---|
62 | <li> |
---|
63 | it is assumed that the <tt>C++</tt> compiler supports all of the |
---|
64 | language features described in the <i>second</i> edition of Stroustrup's book, |
---|
65 | minus exceptions, templates, and derived types. |
---|
66 | </ul> |
---|
67 | |
---|
68 | |
---|
69 | <p> |
---|
70 | |
---|
71 | <h3>The <tt>NTL_CLEAN_INT</tt> flag</h3> |
---|
72 | |
---|
73 | <p> |
---|
74 | |
---|
75 | The configuration flag <tt>NTL_CLEAN_INT</tt> |
---|
76 | is currently <i>off</i> by default. |
---|
77 | |
---|
78 | <p> |
---|
79 | When this flag is off, NTL makes another requirement of its platform; |
---|
80 | namely, that arithmetic operations on the type <tt>long</tt> |
---|
81 | do not overflow, but simply "wrap around" modulo the word size. |
---|
82 | This behavior is <i>not</i> guaranteed by the <tt>C++</tt> standard, |
---|
83 | and yet it is essentially universally implemented. |
---|
84 | In fact, most compilers will go out of their way to ensure this behavior, |
---|
85 | since it is a very reasonable behavior, and since many programs |
---|
86 | implicitly rely on this behavior. |
---|
87 | |
---|
88 | <p> |
---|
89 | Making this "wrap around" assumption can lead to slightly more efficient code |
---|
90 | on some platforms. |
---|
91 | It seems fairly unlikely that one would ever have to turn the |
---|
92 | <tt>NTL_CLEAN_INT</tt> flag <i>on</i>, but it seems a good idea |
---|
93 | to make this possible, and at the very least |
---|
94 | to identify and isolate the code that |
---|
95 | relies on this assumption. |
---|
96 | |
---|
97 | |
---|
98 | |
---|
99 | |
---|
100 | <p> |
---|
101 | Actually, with <tt>NTL_CLEAN_INT</tt> off, it is also assumed |
---|
102 | that right shifts of signed integers are consistent, |
---|
103 | in the sense that if it is sometimes an arithmetic shift, |
---|
104 | then it is always an arithmetic shift (the installation |
---|
105 | scripts check if right shift appears to be arithmetic, and if so, |
---|
106 | this assumption is made elsewhere). |
---|
107 | |
---|
108 | <p> |
---|
109 | It is hard to imagine that there is a platform existing today |
---|
110 | (or in the foreseeable future) where these assumptions |
---|
111 | are not meet. |
---|
112 | However, |
---|
113 | as of version 5.4 of NTL, all of the most |
---|
114 | performance-critical code now works almost as well |
---|
115 | with <tt>NTL_CLEAN_INT</tt> set as without. |
---|
116 | The differences are not very significant (maybe 10%). |
---|
117 | Therefore, there is hardly any reason to not set this flag. |
---|
118 | Also, note that the only code affected by this flag |
---|
119 | is the traditional long integer package (which, if you use |
---|
120 | GMP as the primary long integer package, is not involved), |
---|
121 | and the single-precision modular multiplication routines |
---|
122 | defined in <tt>ZZ.h</tt>. |
---|
123 | |
---|
124 | <p> |
---|
125 | |
---|
126 | <h3>The <tt>NTL_CLEAN_PTR</tt> flag</h3> |
---|
127 | |
---|
128 | <p> |
---|
129 | |
---|
130 | The configuration flag <tt>NTL_CLEAN_PTR</tt> |
---|
131 | is currently <i>off</i> by default. |
---|
132 | |
---|
133 | <p> |
---|
134 | When this flag is off, NTL makes another requirement of its platform; |
---|
135 | namely, that the address space is "flat", and in particular, |
---|
136 | that one can test if an object pointed to by a pointer <tt>p</tt> |
---|
137 | is located in a array of objects <tt>v[0..n-1]</tt> by testing |
---|
138 | if <tt>p >= v</tt> and <tt>p < v + n</tt>. |
---|
139 | The <tt>C++</tt> standard does not guarantee that such a test will |
---|
140 | work; the only way to perform this test in a standard-conforming way |
---|
141 | is to iteratively test if <tt>p == v</tt>, <tt>p == v+1</tt>, etc. |
---|
142 | |
---|
143 | <p> |
---|
144 | This assumption of a "flat" address space is essentially universally |
---|
145 | valid, and making this assumption leads to some more efficient code. |
---|
146 | For this reason, the <tt>NTL_CLEAN_PTR</tt> is <i>off</i> by default, |
---|
147 | but one can always turn it on, and in fact, the overall performance |
---|
148 | penalty should be negligible for most applications. |
---|
149 | |
---|
150 | |
---|
151 | |
---|
152 | <h3>Some floating point issues</h3> |
---|
153 | |
---|
154 | |
---|
155 | <p> |
---|
156 | NTL uses floating point arithmetic in a number of places, |
---|
157 | including a number of exact computations, where one might |
---|
158 | not expect to see floating point. |
---|
159 | Relying on floating point may seem prone to errors, |
---|
160 | but with the guarantees provided by the IEEE standard, |
---|
161 | one can prove the correctness of the NTL code that uses floating point. |
---|
162 | |
---|
163 | <p> |
---|
164 | Briefly, the IEEE floating point standard says that basic arithmetic operations |
---|
165 | on doubles should work <i>as if</i> the operation were performed with infinite |
---|
166 | precision, and then rounded to <tt>p</tt> bits, |
---|
167 | where <tt>p</tt> is the precision (typically, <tt>p = 53</tt>). |
---|
168 | |
---|
169 | |
---|
170 | <p> |
---|
171 | Throughout most of NTL, correctness follows from weaker assumptions, |
---|
172 | namely |
---|
173 | <p> |
---|
174 | <ul> |
---|
175 | <li> |
---|
176 | basic arithmetic operations and conversion from integral types |
---|
177 | produce results with a <i>relative error</i> of |
---|
178 | <tt>2^{-p + 1}</tt> (assuming no overflow), |
---|
179 | <li> |
---|
180 | multiplication by powers of 2 produce <i>exact</i> results (assuming no overflow), |
---|
181 | <li> |
---|
182 | basic arithmetic operations on integers represented as doubles and conversions from integral types |
---|
183 | to doubles produce <i>exact</i> results, provided the inputs and outputs |
---|
184 | are less than <tt>2^p</tt> in absolute value, |
---|
185 | <li> |
---|
186 | if <tt>y/2 <= x <= 2y</tt>, then <tt>x-y</tt> is computed exactly. |
---|
187 | </ul> |
---|
188 | Also, NTL allows the compiler to compute <tt>z = x/y</tt> as |
---|
189 | <tt>t = 1/y</tt>, <tt>z = t*x</tt>. |
---|
190 | |
---|
191 | <p> |
---|
192 | One big problem with the IEEE standard is that it allows intermediate |
---|
193 | quantities to be computed in a higher precision than the standard |
---|
194 | double precision. |
---|
195 | This "looseness" in the standard is a substantial impediment to |
---|
196 | creating portable software. |
---|
197 | Most platforms today implement the "strict" IEEE standard, with no |
---|
198 | excess precision. |
---|
199 | One notable exception -- the 800 pound gorilla, so to speak -- |
---|
200 | is the Intel x86. |
---|
201 | |
---|
202 | <p> |
---|
203 | NTL goes out of its way to ensure that its code is correct with |
---|
204 | both "strict" and "loose" IEEE floating point. |
---|
205 | This is achieved in a portable fashion throughout NTL, except |
---|
206 | for the <tt>quad_float</tt> module, where some desperate hacks, |
---|
207 | including assembly code, may be used |
---|
208 | to try to work around problems created by "loose" IEEE floating point |
---|
209 | <a href="quad_float.txt">[more details]</a>. |
---|
210 | But note that even if the <tt>quad_float</tt> package does not work correctly |
---|
211 | because of these problems, the only other routines that are affected |
---|
212 | are the <tt>LLL_QP</tt> routines in the <tt>LLL</tt> module -- the |
---|
213 | rest of NTL should work fine. |
---|
214 | |
---|
215 | |
---|
216 | |
---|
217 | <p> |
---|
218 | Mostly, NTL does not |
---|
219 | require that the IEEE floating point |
---|
220 | special quantities "infinity" |
---|
221 | and "not a number" are implemented correctly. |
---|
222 | This is certainly the case for core code where |
---|
223 | floating point arithmetic is used for exact (but fast) |
---|
224 | computations, as the numbers involved never get too big (or small). |
---|
225 | However, the behavior of |
---|
226 | certain explicit floating point computations |
---|
227 | (e.g., the <tt>xdouble</tt> and <tt>quad_float</tt> classes, |
---|
228 | and the floating point versions of LLL) will be |
---|
229 | much more predictable and reliable if "infinity" |
---|
230 | and "not a number" are implemented correctly. |
---|
231 | |
---|
232 | |
---|
233 | <p> |
---|
234 | <h3>Implementing long integer arithmetic</h3> |
---|
235 | <p> |
---|
236 | There are three basic strategies for implementing long integer arithmetic. |
---|
237 | |
---|
238 | <p> |
---|
239 | The <i>default</i> strategy is implemented in the |
---|
240 | <i>traditional long integer arithmetic package</i>. |
---|
241 | This package is derived from the LIP package originally developed by |
---|
242 | A. K. Lenstra, although it has evolved quite a bit within NTL. |
---|
243 | This package uses no assembly code and is very portable. |
---|
244 | |
---|
245 | <p> |
---|
246 | The <i>second</i> strategy is to use the Gnu Multi-Precision Package (GMP) |
---|
247 | as a <i>supplemental long integer arithmetic package</i>. |
---|
248 | In this strategy, the representation of long integers is identical |
---|
249 | to that in he traditional long integer package. |
---|
250 | This representation is incompatible with the GMP representation, |
---|
251 | and on-the-fly conversions are done between the two representations |
---|
252 | (only when this is sensible). |
---|
253 | This strategy typically yields better performance, but requires |
---|
254 | that GMP is installed on your platform. |
---|
255 | |
---|
256 | <p> |
---|
257 | The <i>third</i> strategy is to use GMP as the |
---|
258 | <i>primary long integer arithmetic package</i>. |
---|
259 | In this strategy, the representation of long integers is in a |
---|
260 | form compatible with GMP. |
---|
261 | This strategy typically yields the best performance, |
---|
262 | but requires |
---|
263 | that GMP is installed on your platform, and also |
---|
264 | introduces some minor backward incompatibilities in the programming |
---|
265 | interface. |
---|
266 | |
---|
267 | <p> |
---|
268 | <a href="tour-gmp.html">Go here</a> for more details on the use |
---|
269 | of GMP with NTL. |
---|
270 | |
---|
271 | <p> |
---|
272 | <h3>Algorithms</h3> |
---|
273 | <p> |
---|
274 | NTL makes fairly consistent use of asymptotically fast algorithms. |
---|
275 | |
---|
276 | <p> |
---|
277 | Long integer multiplication is implemented using the classical |
---|
278 | algorithm, crossing over to Karatsuba for very big numbers. |
---|
279 | Long integer division is currently only implemented using |
---|
280 | the classical algorithm -- unless you use NTL with GMP (version 3 or later) |
---|
281 | as either a supplemental or primary long integer package, |
---|
282 | which |
---|
283 | employs an algorithm that is about twice as slow as multiplication |
---|
284 | for very large numbers. |
---|
285 | <p> |
---|
286 | Polynomial multiplication and division is carried out |
---|
287 | using a combination of the classical algorithm, Karatsuba, |
---|
288 | the FFT using small primes, and the FFT using the Schoenhagge-Strassen |
---|
289 | approach. |
---|
290 | The choice of algorithm depends on the coefficient domain. |
---|
291 | <p> |
---|
292 | Many algorithms employed throughout NTL are inventions |
---|
293 | of the author (<a href="http://www.shoup.net">Victor Shoup</a>) |
---|
294 | and his colleagues |
---|
295 | <a href="http://math-www.uni-paderborn.de/~aggathen/joachim.html">Joachim von zur Gathen</a> |
---|
296 | and |
---|
297 | <a href="http://www4.ncsu.edu/~kaltofen">Erich Kaltofen</a>, |
---|
298 | as well as <a href="mailto:abbott@dima.unige.it">John Abbott</a> |
---|
299 | and |
---|
300 | <a href="http://www.loria.fr/~zimmerma">Paul Zimmermann</a>. |
---|
301 | |
---|
302 | <p> |
---|
303 | <h3> |
---|
304 | Some of NTL's imperfections |
---|
305 | </h3> |
---|
306 | <p> |
---|
307 | |
---|
308 | NTL is not a "perfect" library. |
---|
309 | Here are some limitations of NTL that a "perfect" library would not have: |
---|
310 | <p> |
---|
311 | <ul> |
---|
312 | <li> |
---|
313 | NTL is neither thread-safe nor re-entrant, and making it so |
---|
314 | would require a fundamental redesign. |
---|
315 | <p> |
---|
316 | |
---|
317 | <li> |
---|
318 | NTL provides only a very crude form of error handling: |
---|
319 | print an error message and abort. |
---|
320 | For most NTL users, this is quite sufficient. |
---|
321 | The alternative would be to have NTL throw exceptions. |
---|
322 | Writing code that handles exceptions correctly is quite difficult. |
---|
323 | The easy part is throwing and catching exceptions. |
---|
324 | The hard part is writing code <i>through which</i> an exception |
---|
325 | can be safely and correctly thrown. |
---|
326 | Retrofitting NTL to throw exceptions at this late date |
---|
327 | would be quite difficult and error prone, and I do not think |
---|
328 | that there is much demand for it. |
---|
329 | |
---|
330 | <p> |
---|
331 | |
---|
332 | <li> |
---|
333 | NTL does not release all of its resources. |
---|
334 | There are some routines which for efficiency reasons will |
---|
335 | allocate some memory and never give it back to the system, |
---|
336 | so as to avoid re-allocations on subsequent calls. |
---|
337 | The amount of memory "stolen" by NTL in this way is fairly reasonable, |
---|
338 | and I have heard no complaints yet about its effects. |
---|
339 | |
---|
340 | </ul> |
---|
341 | |
---|
342 | |
---|
343 | <p> |
---|
344 | |
---|
345 | <center> |
---|
346 | <a href="tour-tips.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a> |
---|
347 | <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> |
---|
348 | <a href="tour-gmp.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a> |
---|
349 | </center> |
---|
350 | |
---|
351 | |
---|
352 | </body> |
---|
353 | </html> |
---|