1 | <html> |
---|
2 | <head> |
---|
3 | <title> |
---|
4 | A Tour of NTL: Some Performance Data </title> |
---|
5 | </head> |
---|
6 | |
---|
7 | <body bgcolor="#fff9e6"> |
---|
8 | |
---|
9 | <center> |
---|
10 | <a href="tour-gf2x.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a> |
---|
11 | <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> |
---|
12 | <a href="tour-roadmap.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a> |
---|
13 | </center> |
---|
14 | |
---|
15 | <h1> |
---|
16 | <p align=center> |
---|
17 | A Tour of NTL: Some Performance Data |
---|
18 | </p> |
---|
19 | </h1> |
---|
20 | |
---|
21 | <p> <hr> <p> |
---|
22 | |
---|
23 | Here are some timing figures from using NTL. |
---|
24 | The figures were obtained using a 700MHz Pentium III running Linux, |
---|
25 | and using gcc version 2.95.2. |
---|
26 | Also, GMP version 3.1.1 was used as the primary long integer package. |
---|
27 | |
---|
28 | <p> |
---|
29 | <h2> |
---|
30 | Basic arithmetic |
---|
31 | </h2> |
---|
32 | <p> |
---|
33 | |
---|
34 | For various values of <var>n</var>, we give the running times |
---|
35 | for the following tasks: |
---|
36 | |
---|
37 | <ul> |
---|
38 | <li> IMUL: multiply two <var>n</var>-bit numbers. |
---|
39 | <li> IREM: compute the remainder upon dividing a <var>2 n</var>-bit number |
---|
40 | by an <var>n</var>-bit number. |
---|
41 | <li> IINV: compute an inverse modulo an <var>n</var>-bit prime number. |
---|
42 | <li> PMUL: multiply two <var>(n-1)</var>-degree polynomials modulo |
---|
43 | an <var>n</var>-bit prime number. |
---|
44 | </ul> |
---|
45 | |
---|
46 | <p> |
---|
47 | |
---|
48 | The time for IMUL, IREM, IINV is stated in units of <var>10^{-6}s</var>, |
---|
49 | and that of PMUL in units of seconds. |
---|
50 | |
---|
51 | <p> |
---|
52 | |
---|
53 | <table border=1> |
---|
54 | <tr align=center> |
---|
55 | <td> <var>n</var> <td> IMUL <td> IREM |
---|
56 | <td> IINV <td> PMUL |
---|
57 | |
---|
58 | <tr align=right> |
---|
59 | <td> 512 <td> 4 <td> 5 <td> 90 <td> 0.11 |
---|
60 | |
---|
61 | <tr align=right> |
---|
62 | <td> 1024 <td> 11 <td> 13 <td> 260 <td> 0.74 |
---|
63 | |
---|
64 | <tr align=right> |
---|
65 | <td> 2048 <td> 33 <td> 53 <td> 580 <td> 2.71 |
---|
66 | |
---|
67 | <tr align=right> |
---|
68 | <td> 4096 <td> 110 <td> 168 <td> 1460 <td> 16.64 |
---|
69 | |
---|
70 | </table> |
---|
71 | |
---|
72 | |
---|
73 | <p> |
---|
74 | The IMUL, IREM, and IINV times essentially reflect the performance |
---|
75 | of the underlying GMP routines (NTL's interface to these routines is |
---|
76 | very "light weight"). |
---|
77 | |
---|
78 | <p> |
---|
79 | The PMUL time reflects the performance of NTL's FFT implementation |
---|
80 | of polynomial arithmetic. |
---|
81 | |
---|
82 | |
---|
83 | |
---|
84 | |
---|
85 | <p> |
---|
86 | <h2> |
---|
87 | Factoring polynomials over finite fields |
---|
88 | </h2> |
---|
89 | <p> |
---|
90 | |
---|
91 | We next consider the problem of factoriing of |
---|
92 | univariate polynomials modulo a prime <var>p</var>. |
---|
93 | As test polynomials, we take the family of polynomials |
---|
94 | defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995]. |
---|
95 | For every <var>n</var>, we define <var>p</var> |
---|
96 | to be the first prime greater than |
---|
97 | <var>2^{n-2}*PI</var>, and the polynomial is |
---|
98 | <p> |
---|
99 | <var>sum(a[n-i]*X^i, i = 0..n), </var> |
---|
100 | <p> |
---|
101 | where <var>a[0] = 1</var>, and <var>a[i+1] = a[i]^2 + 1.</var> |
---|
102 | |
---|
103 | <p> |
---|
104 | Here are some running times, in seconds, for various values of <var>n</var>: |
---|
105 | |
---|
106 | <p> |
---|
107 | <table border=1> |
---|
108 | <tr align=right> |
---|
109 | <td> <var>n</var> <td> 64 <td> 128 <td> 256 <td> 512 <td> 1024 |
---|
110 | <tr align=right> |
---|
111 | <td> <td> 0.4 <td> 2.8 <td> 25.3 <td> 238.8 <td> 2255.4 |
---|
112 | </table> |
---|
113 | |
---|
114 | <p> |
---|
115 | Also of interest is space usage. |
---|
116 | The <var>n = 512 </var> case used about 5MB main memory, |
---|
117 | and the <var>n = 1024</var> case used 18MB main memory. |
---|
118 | |
---|
119 | <p> |
---|
120 | <h2> |
---|
121 | Factoring polynomials over the integers |
---|
122 | </h2> |
---|
123 | <p> |
---|
124 | |
---|
125 | The second problem considered is factoring univariate polynomials |
---|
126 | over the integers. |
---|
127 | In NTL 5.2, we have implemented |
---|
128 | <a href="http://www.openmath.org/~hoeij">Mark van Hoeij's</a> new |
---|
129 | method for |
---|
130 | polynomial factorization. |
---|
131 | Prior to this, NTL offered an implementation of the classical |
---|
132 | ``Zassenhaus method,'' which can take exponential time |
---|
133 | in the worst case, although the NTL algorithm implements |
---|
134 | several novel ``pruning techniques'' that can subsantially |
---|
135 | speed up the brute-force search stage of the Zassenhaus method. |
---|
136 | However, van Hoeij's algorithm seems to have made all of these |
---|
137 | techniques more or less obsolete. |
---|
138 | |
---|
139 | <p> |
---|
140 | Note that van Hoeij's method is more a general algorithmic technique |
---|
141 | than a specific algorithm. |
---|
142 | To obtain a specific algorithm, several parameters and strategies |
---|
143 | must be chosen. |
---|
144 | We have attempted to make these choices so as to obtain |
---|
145 | a good, general-purpose algorithm that works fairly well |
---|
146 | on a wide variety of input polynomials. |
---|
147 | All of these choices are a bit heuristic, and any feedback |
---|
148 | refarding possible improvements is most welcome. |
---|
149 | |
---|
150 | <p> |
---|
151 | We consider a number of test polynomials, collected and/or suggested |
---|
152 | by <a href="http://www.loria.fr/~zimmerma">Paul Zimmermann</a> |
---|
153 | and Mark van Hoeij. |
---|
154 | |
---|
155 | |
---|
156 | <p> |
---|
157 | <ul> |
---|
158 | <li> P1 |
---|
159 | has degree 156, coefficients up to |
---|
160 | 424 digits, and 36 factors (12 of degree 2, 15 of degree 4, 9 of degree 8). |
---|
161 | <li> |
---|
162 | P2 |
---|
163 | has degree 196, coefficients up to 419 digits |
---|
164 | and 12 factors (2 of degree 2, 4 of degree 12 and 6 of degree 24). |
---|
165 | <li> |
---|
166 | P3 |
---|
167 | has degree 336, coefficients up to 597 digits and 16 |
---|
168 | factors (4 of degree 12 and 12 of degree 24). |
---|
169 | <li> |
---|
170 | P4 |
---|
171 | has degree 462, coefficients up to 756 digits, and two factors (degrees |
---|
172 | 66 and 396). |
---|
173 | <li> |
---|
174 | P5 has degree 64, coefficients up to 40 digits, |
---|
175 | and is irreductible. |
---|
176 | <li> |
---|
177 | P6 |
---|
178 | has degree 144, |
---|
179 | coefficients up to 165 digits and has |
---|
180 | 6 factors (4 of degree 12 and 2 of degree 48). |
---|
181 | <li> |
---|
182 | P7 |
---|
183 | has degree 384 and coefficients |
---|
184 | up to 57 digits, and is irreducible. |
---|
185 | <li> |
---|
186 | P8 |
---|
187 | has degree 972, coefficients up to 213 digits, and is irreducible. |
---|
188 | <li> |
---|
189 | M12_5 |
---|
190 | has degree 792, coefficients up to 2813 digits, and is irreducible. |
---|
191 | <li> |
---|
192 | M12_6 |
---|
193 | has degree 924, coefficients up to 3937 digits, and |
---|
194 | two factors (degrees 132 and 792). |
---|
195 | </ul> |
---|
196 | |
---|
197 | <p> |
---|
198 | |
---|
199 | P1 comes from the Rational |
---|
200 | Univariate Representation (RUR) of the Cyclic 6 system, |
---|
201 | and P2 and P3 come from the RUR of Cyclic 7. |
---|
202 | <p> |
---|
203 | P4 was contributed by A. Hulpke and H. Matzat: it is the 5-set resolvent of |
---|
204 | the polynomial f = x^11 + 101*x^10 + 4151*x^9 + 87851*x^8 + 976826*x^7 + |
---|
205 | 4621826*x^6 - 5948674*x^5 - 113111674*x^4 - 12236299*x^3 + 1119536201*x^2 - |
---|
206 | 1660753125*x - 332150625 and its factorization would prove that f has |
---|
207 | Galois group M11. |
---|
208 | <p> |
---|
209 | P5 is the Swinnerton-Dyer polynomial for 2,3,5,7,11,13, i.e. the |
---|
210 | product of all terms of the form x+/-sqrt(2)+/-...+/-sqrt(13). |
---|
211 | It is a "bad" polynomial for the method of factoring in Fp and |
---|
212 | combining the factors in Z, because all its factors in Fp are of |
---|
213 | degree at most 2. |
---|
214 | <p> |
---|
215 | P6 is the resultant with respect to x of the polynomial |
---|
216 | p = -2566974800*x^2 +134697056*x^4 |
---|
217 | -3312297*x^6 +43109*x^8 -308*x^10 +x^12 +1142440000 with p(y-2*x); |
---|
218 | it was contributed by Frederic Lehobey and |
---|
219 | Nicolas Rennert. |
---|
220 | <p> |
---|
221 | P7 (contributed by John Abbott) is the norm of |
---|
222 | x^6 + (a1+a2+a3+a4+a5+a6)*(x^4-x^2) + 1 with a6^2-2, a5^2-3, a4^2-5, |
---|
223 | a3^2-7, a2^2-11 and a1^2-13, i.e. the resultant of that polynomial with |
---|
224 | a6^2-2 with respect to a6, then with a5^2-3 wrt a5, and so on. |
---|
225 | <p> |
---|
226 | P8 (contributed by Jean-Charles Faugere) comes from the Groebner basis |
---|
227 | of Cyclic 9. |
---|
228 | <p> |
---|
229 | M12_5 and M12_6 are the 5th and 6th resolvents of the polynomial f |
---|
230 | with Galois group M_{12} from the example in van Hoeij's paper. |
---|
231 | These polynomials are non-monic (unlike the others). |
---|
232 | |
---|
233 | <p> |
---|
234 | We also used the polynomials S6, S7, S8, S9, and S10, |
---|
235 | where S<var>i</var> is the Swinnerton-Dyer polynomial |
---|
236 | of degree <var>2^i</var> corresponding to the first <var>i</var> primes. |
---|
237 | These are all irreducible polynomials, |
---|
238 | and are particularly bad for the Zassenhaus method. |
---|
239 | |
---|
240 | <p> |
---|
241 | |
---|
242 | The following two polynomials, H1 and H2, were suggested by Mark van Hoeij |
---|
243 | as examples that may |
---|
244 | be particularly challenging to factor using his algorithm: |
---|
245 | |
---|
246 | <p> |
---|
247 | <ul> |
---|
248 | <li> H1 is the polynomial (x-1)^960-1. |
---|
249 | <li> H2 is a degree 4096 polynomial that is the product |
---|
250 | of a number of cyclotomic polynomials |
---|
251 | (with factors of degrees 128, 128, 256, 512, 1024, and 2048). |
---|
252 | </ul> |
---|
253 | |
---|
254 | <p> |
---|
255 | The polynomial H1 is defined as it is, rather than as x^960-1, |
---|
256 | simply to "hide" the special structure of the polynomial -- many |
---|
257 | factorizers would apply special techniques to factor x^960-1, |
---|
258 | but will most likely not apply these techniques to (x-1)^960-1. |
---|
259 | To factor this polynomial with van Hoeij's method, one has to consider |
---|
260 | very high traces, and thus it is essential that a good implementation |
---|
261 | at some point crosses over to a "dense" strategy (or alternatively, |
---|
262 | implements a brute-force factor search in combination with van Hoeij's |
---|
263 | method). |
---|
264 | |
---|
265 | <p> |
---|
266 | You can <a href="http://www.shoup.net/ntl/testpolys.tar.gz">download</a> |
---|
267 | all of the above polynomials (except H1, which is easy trivial to generate |
---|
268 | from scratch). |
---|
269 | |
---|
270 | <p> |
---|
271 | We collected timing data for both the Zassenhaus and van Hoeij |
---|
272 | algorithms, both with and without the so-called "power hack". |
---|
273 | By setting a run-time switch, the user can choose to employ a "power hack", |
---|
274 | which attempts to speed up the factorization if the polynomial |
---|
275 | f(x) is of the form g(x^m) for some m > 1. |
---|
276 | However, while this "power hack" may sometimes speed things up substantially, |
---|
277 | it can also slow things down -- for the Zassenhaus method, this slowdown |
---|
278 | is usually negligible, but for our implementation of van Hoeij's algorithm, |
---|
279 | this slowdown can be quite substantial. |
---|
280 | Because of this, in our implementation of van Hoeij, we have modified |
---|
281 | the power hack so that if it "seems" to be going slowly, it is abandoned. |
---|
282 | This "early abort" power hack strategy |
---|
283 | is "on" by default, but can be turned off by the |
---|
284 | user by setting a run-time switch. |
---|
285 | <p> |
---|
286 | |
---|
287 | In the table below, running time is presented in seconds |
---|
288 | for each of the 4 methods: |
---|
289 | <p> |
---|
290 | <ul> |
---|
291 | <li> Z-: Zassenhaus without power hack. |
---|
292 | <li> Z+: Zassenhaus with power hack. |
---|
293 | <li> H-: van Hoeij without power hack. |
---|
294 | <li> H+: van Hoeij with power hack (NTL's default strategy). |
---|
295 | |
---|
296 | </ul> |
---|
297 | <p> |
---|
298 | |
---|
299 | We also state for each polynomial the number <var>r</var> of modular |
---|
300 | factors of the polynomial (this is not necessarily relevant for the |
---|
301 | power-hack running times), |
---|
302 | |
---|
303 | <p> |
---|
304 | For comparison, we also state some running times for Maple version 5.1, |
---|
305 | running on a 500MHz Alpha 21264/EV6, |
---|
306 | as reported by Paul Zimmermann. |
---|
307 | |
---|
308 | <p> |
---|
309 | |
---|
310 | <table border=1> |
---|
311 | |
---|
312 | <tr align=right> |
---|
313 | <td align=left> <td> <var>r</var> <td> Z- <td> Z+ <td> H- <td> H+ <td> maple |
---|
314 | |
---|
315 | <tr align=right> |
---|
316 | <td align=left> P1 <td> 60 <td> 2.8 <td> 0.3 <td> 2.8 <td> 0.3 <td> 0.6 |
---|
317 | |
---|
318 | <tr align=right> |
---|
319 | <td align=left> P2 <td> 20 <td> 4.3 <td> 1.5 <td> 4.3 <td> 1.5 <td> 1.8 |
---|
320 | |
---|
321 | <tr align=right> |
---|
322 | <td align=left> P3 <td> 28 <td> 13.4 <td> 3.0 <td> 13.4 <td> 3.0 <td> 2.4 |
---|
323 | |
---|
324 | <tr align=right> |
---|
325 | <td align=left> P4 <td> 42 <td> 40.2 <td> 40.2 <td> 27.2 <td> 27.2 <td> > 5000.0 |
---|
326 | |
---|
327 | <tr align=right> |
---|
328 | <td align=left> P5 <td> 32 <td> 39.2 <td> 37.5 <td> 0.5 <td> 0.5 <td> > 5000.0 |
---|
329 | |
---|
330 | <tr align=right> |
---|
331 | <td align=left> P6 <td> 48 <td> 24.0 <td> 0.9 <td> 2.9 <td> 1.0 <td> 48.0 |
---|
332 | |
---|
333 | <tr align=right> |
---|
334 | <td align=left> P7 <td> 76 <td> <td> <td> 15.5 <td> 13.2 <td> |
---|
335 | |
---|
336 | <tr align=right> |
---|
337 | <td align=left> P8 <td> 54 <td> 3930.0 <td> 3940.0 <td> 44.3 <td> 52.9 <td> > 5000.0 |
---|
338 | |
---|
339 | <tr align=right> |
---|
340 | <td align=left> P4*rev(P4) <td> 84 <td> <td> <td> 826.0 <td> 826.0 <td> |
---|
341 | |
---|
342 | <tr align=right> |
---|
343 | <td align=left> H1 <td> 131 <td> <td> <td> 108.0 <td> 108.0 <td> |
---|
344 | |
---|
345 | <tr align=right> |
---|
346 | <td align=left> H2 <td> 256 <td> <td> <td> <td> 379.0 <td> |
---|
347 | |
---|
348 | <tr align=right> |
---|
349 | <td align=left> S7 <td> 64 <td> <td> <td> 3.4 <td> 3.8 <td> |
---|
350 | |
---|
351 | <tr align=right> |
---|
352 | <td align=left> S8 <td> 128 <td> <td> <td> 53.4 <td> 64.6 <td> |
---|
353 | |
---|
354 | <tr align=right> |
---|
355 | <td align=left> S9 <td> 256 <td> <td> <td> 1200.0 <td> 1360.0 <td> |
---|
356 | |
---|
357 | <tr align=right> |
---|
358 | <td align=left> S10 <td> 512 <td> <td> <td> 31300.0 <td> 31700.0 <td> |
---|
359 | |
---|
360 | <tr align=right> |
---|
361 | <td align=left> S6*S7 <td> 96 <td> <td> <td> 18.8 <td> 7.2 <td> |
---|
362 | |
---|
363 | <tr align=right> |
---|
364 | <td align=left> S7*S9 <td> 320 <td> <td> <td> 3550.0 <td> 3630.0 <td> |
---|
365 | |
---|
366 | <tr align=right> |
---|
367 | <td align=left> S8*S9 <td> 384 <td> <td> <td> 8410.0 <td> 8600.0 <td> |
---|
368 | |
---|
369 | <tr align=right> |
---|
370 | <td align=left> M12_5 <td> 72 <td> <td> <td> 129.0 <td> 129.0 <td> |
---|
371 | |
---|
372 | <tr align=right> |
---|
373 | <td align=left> M12_6 <td> 84 <td> <td> <td> 410.0 <td> 396.0 <td> |
---|
374 | |
---|
375 | </table> |
---|
376 | |
---|
377 | <p> |
---|
378 | Some remarks: |
---|
379 | <p> |
---|
380 | <ul> |
---|
381 | <li> |
---|
382 | A blank entry indicates that the corresponding experiment |
---|
383 | was not performed. |
---|
384 | <li> |
---|
385 | For several of the Maple experiments, the experiment was terminated |
---|
386 | after 5000 seconds, as indicated in the table. |
---|
387 | <li> |
---|
388 | It would be futile to attempt to factor any of the other polynomials |
---|
389 | in the table using the Zassenhaus method, as the running time |
---|
390 | of that method is exponential in <var>r</var>. |
---|
391 | <li> |
---|
392 | We attempted to factor H2 using the H- algorithm, but after several |
---|
393 | hours of computation, the algorithm had not yet terminated. |
---|
394 | Using the H+ algorithm, things were much easier: the hardest part |
---|
395 | was to prove the irreducibility of a degree 2048 polynomial, |
---|
396 | for which the number of modular factors was 128. |
---|
397 | <li> |
---|
398 | Note that the power hack strategy can slow things down a bit, |
---|
399 | but can also speed things up, sometimes dramatically. |
---|
400 | <li> |
---|
401 | In the above table, rev(f) denotes the "reverse" of the polynomial f, |
---|
402 | i.e., x^{deg(f)}f(x^{-1}). |
---|
403 | </ul> |
---|
404 | |
---|
405 | |
---|
406 | <p> |
---|
407 | |
---|
408 | |
---|
409 | <center> |
---|
410 | <a href="tour-gf2x.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a> |
---|
411 | <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> |
---|
412 | <a href="tour-roadmap.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a> |
---|
413 | </center> |
---|
414 | |
---|
415 | |
---|
416 | </body> |
---|
417 | </html> |
---|