source: git/ntl/doc/tour-time.html @ 6ce030f

spielwiese
Last change on this file since 6ce030f was 26e030, checked in by Hans Schönemann <hannes@…>, 15 years ago
*hannes: update to 5.5.1 git-svn-id: file:///usr/local/Singular/svn/trunk@11949 2c84dea3-7e68-4137-9b89-c4e89433aadc
  • Property mode set to 100644
File size: 12.7 KB
Line 
1<html>
2<head>
3<title>
4A Tour of NTL: Some Performance Data  </title>
5</head>
6
7<body bgcolor="#fff9e6">
8
9<center>
10<a href="tour-gf2x.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a>
11 <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> 
12<a href="tour-roadmap.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a>
13</center>
14
15<h1> 
16<p align=center>
17A Tour of NTL: Some Performance Data
18</p>
19</h1>
20
21<p> <hr> <p>
22
23Here are some timing figures from using NTL.
24The figures were obtained using a 700MHz Pentium III running Linux,
25and using gcc version 2.95.2.
26Also, GMP version 3.1.1 was used as the primary long integer package.
27
28<p>
29<h2>
30Basic arithmetic
31</h2>
32<p>
33
34For various values of <var>n</var>, we give the running times
35for the following tasks:
36
37<ul>
38<li> IMUL: multiply two <var>n</var>-bit numbers.
39<li> IREM: compute the remainder upon dividing a <var>2 n</var>-bit number
40     by an <var>n</var>-bit number.
41<li> IINV: compute an inverse modulo an <var>n</var>-bit prime number.
42<li> PMUL: multiply two <var>(n-1)</var>-degree polynomials modulo
43     an <var>n</var>-bit prime number.
44</ul>
45
46<p>
47
48The time for IMUL, IREM, IINV is stated in units of <var>10^{-6}s</var>,
49and that of PMUL in units of seconds.
50
51<p>
52
53<table border=1>
54<tr align=center>
55   <td> <var>n</var> <td> IMUL  <td> IREM
56   <td> IINV  <td> PMUL
57
58<tr align=right>
59   <td>  512 <td> 4 <td> 5 <td> 90 <td> 0.11
60
61<tr align=right> 
62   <td> 1024 <td> 11 <td> 13 <td> 260 <td> 0.74
63
64<tr align=right>
65   <td> 2048 <td> 33 <td> 53 <td> 580 <td> 2.71
66
67<tr align=right>
68   <td> 4096 <td> 110 <td> 168 <td> 1460 <td> 16.64
69
70</table>
71
72
73<p>
74The IMUL, IREM, and IINV times essentially reflect the performance
75of the underlying GMP routines (NTL's interface to these routines is
76very "light weight").
77
78<p>
79The PMUL time reflects the performance of NTL's FFT implementation
80of polynomial arithmetic.
81
82
83
84
85<p>
86<h2>
87Factoring polynomials over finite fields
88</h2>
89<p>
90
91We next consider the problem of factoriing of
92univariate polynomials modulo a prime <var>p</var>.
93As test polynomials, we take the family of polynomials
94defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995].
95For every <var>n</var>, we define <var>p</var> 
96to be the first prime greater than
97<var>2^{n-2}*PI</var>, and the polynomial is
98<p>
99<var>sum(a[n-i]*X^i, i = 0..n), </var>
100<p>
101where <var>a[0] = 1</var>, and <var>a[i+1] = a[i]^2 + 1.</var>
102
103<p>
104Here are some running times, in seconds, for various values of <var>n</var>:
105
106<p>
107<table border=1>
108<tr align=right>
109<td> <var>n</var> <td> 64 <td> 128 <td> 256 <td> 512 <td> 1024
110<tr align=right>
111<td> &nbsp;  <td> 0.4 <td> 2.8 <td> 25.3 <td> 238.8 <td> 2255.4
112</table>
113
114<p>
115Also of interest is space usage.
116The <var>n = 512 </var> case used about 5MB main memory,
117and the <var>n = 1024</var> case used 18MB main memory.
118
119<p>
120<h2>
121Factoring polynomials over the integers
122</h2>
123<p>
124
125The second problem considered is factoring univariate polynomials
126over the integers.
127In NTL 5.2, we have implemented
128<a href="http://www.openmath.org/~hoeij">Mark van Hoeij's</a> new
129method for
130polynomial factorization.
131Prior to this, NTL offered an implementation of the classical
132``Zassenhaus method,'' which can take exponential time
133in the worst case, although the NTL algorithm implements
134several novel ``pruning techniques'' that can subsantially
135speed up the brute-force search stage of the Zassenhaus method.
136However, van Hoeij's algorithm seems to have made all of these
137techniques more or less obsolete.
138
139<p>
140Note that van Hoeij's method is more a general algorithmic technique
141than a specific algorithm.
142To obtain a specific algorithm, several parameters and strategies
143must be chosen.
144We have attempted to make these choices so as to obtain
145a good, general-purpose algorithm that works fairly well
146on a wide variety of input polynomials.
147All of these choices are a bit heuristic, and any feedback
148refarding possible improvements is most welcome.
149
150<p>
151We consider a number of test polynomials, collected and/or suggested
152by  <a href="http://www.loria.fr/~zimmerma">Paul Zimmermann</a>
153and Mark van Hoeij.
154
155
156<p>
157<ul>
158<li> P1
159has degree 156, coefficients up to
160424 digits, and 36 factors (12 of degree 2, 15 of degree 4, 9 of degree 8).
161<li>
162P2
163has degree 196, coefficients up to 419 digits
164and 12 factors (2 of degree 2, 4 of degree 12 and 6 of degree 24).
165<li>
166P3
167has degree 336, coefficients up to 597 digits and 16
168factors (4 of degree 12 and 12 of degree 24).
169<li>
170P4
171has degree 462, coefficients up to 756 digits, and two factors (degrees
17266 and 396).
173<li>
174P5 has degree 64, coefficients up to 40 digits,
175and is irreductible.
176<li>
177P6
178has degree 144,
179coefficients up to 165 digits and has
1806 factors (4 of degree 12 and 2 of degree 48).
181<li>
182P7
183has degree 384 and coefficients
184up to 57 digits, and is irreducible.
185<li>
186P8
187has degree 972, coefficients up to 213 digits, and is irreducible.
188<li>
189M12_5
190has degree 792, coefficients up to 2813 digits, and is irreducible.
191<li>
192M12_6
193has degree 924, coefficients up to 3937 digits, and
194two factors (degrees 132 and 792).
195</ul>
196
197<p>
198
199P1 comes from the Rational
200Univariate Representation (RUR) of the Cyclic 6 system,
201and P2 and P3 come from the RUR of Cyclic 7.
202<p>
203P4 was contributed by A. Hulpke and H. Matzat: it is the 5-set resolvent of
204the polynomial f = x^11 + 101*x^10 + 4151*x^9 + 87851*x^8 + 976826*x^7 +
2054621826*x^6 - 5948674*x^5 - 113111674*x^4 - 12236299*x^3 + 1119536201*x^2 -
2061660753125*x - 332150625 and its factorization would prove that f has
207Galois group M11.
208<p>
209P5 is the Swinnerton-Dyer polynomial for 2,3,5,7,11,13, i.e. the
210product of all terms of the form x+/-sqrt(2)+/-...+/-sqrt(13).
211It is a "bad" polynomial for the method of factoring in Fp and
212combining the factors in Z, because all its factors in Fp are of
213degree at most 2.
214<p>
215P6 is the resultant with respect to x of the polynomial
216p = -2566974800*x^2 +134697056*x^4
217-3312297*x^6 +43109*x^8 -308*x^10 +x^12 +1142440000 with p(y-2*x);
218it was contributed by Frederic Lehobey and
219Nicolas Rennert.
220<p>
221P7 (contributed by John Abbott) is the norm of
222x^6 + (a1+a2+a3+a4+a5+a6)*(x^4-x^2) + 1 with a6^2-2, a5^2-3, a4^2-5,
223a3^2-7, a2^2-11 and a1^2-13, i.e. the resultant of that polynomial with
224a6^2-2 with respect to a6, then with  a5^2-3 wrt a5, and so on.
225<p>
226P8 (contributed by Jean-Charles Faugere) comes from the Groebner basis
227of Cyclic 9.
228<p>
229M12_5 and M12_6 are the 5th and 6th resolvents of the polynomial f
230with Galois group M_{12} from the example in van Hoeij's paper.
231These polynomials are non-monic (unlike the others).
232
233<p>
234We also used the polynomials S6, S7, S8, S9, and S10,
235where S<var>i</var> is the Swinnerton-Dyer polynomial
236of degree <var>2^i</var> corresponding to the first <var>i</var> primes.
237These are all irreducible polynomials,
238and are particularly bad for the Zassenhaus method.
239
240<p>
241
242The following two polynomials, H1 and H2, were suggested by Mark van Hoeij
243as examples that may
244be particularly challenging to factor using his algorithm:
245
246<p>
247<ul>
248<li> H1 is the polynomial (x-1)^960-1.
249<li> H2 is a degree 4096 polynomial that is the product
250of a number of cyclotomic polynomials
251(with factors of degrees 128, 128, 256, 512, 1024, and 2048).
252</ul>
253
254<p>
255The polynomial H1 is defined as it is, rather than as x^960-1,
256simply to "hide" the special structure of the polynomial  -- many
257factorizers would apply special techniques to factor x^960-1,
258but will most likely not apply these techniques to (x-1)^960-1.
259To factor this polynomial with van Hoeij's method, one has to consider
260very high traces, and thus it is essential that a good implementation
261at some point crosses over to a "dense" strategy (or alternatively,
262implements a brute-force factor search in combination with van Hoeij's
263method).
264
265<p>
266You can <a href="http://www.shoup.net/ntl/testpolys.tar.gz">download</a>
267all of the above polynomials (except H1, which is easy trivial to generate
268from scratch).
269
270<p>
271We collected timing data for both the Zassenhaus and van Hoeij
272algorithms, both with and without the so-called "power hack".
273By setting a run-time switch, the user can choose to employ a "power hack",
274which attempts to speed up the factorization if the polynomial
275f(x) is of the form g(x^m) for some m &gt; 1.
276However, while this "power hack" may sometimes speed things up substantially,
277it can also slow things down -- for the Zassenhaus method, this slowdown
278is usually negligible, but for our implementation of van Hoeij's algorithm,
279this slowdown can be quite substantial.
280Because of this, in our implementation of van Hoeij, we have modified
281the power hack so that if it "seems" to be going slowly, it is abandoned.
282This "early abort" power hack strategy
283is "on" by default, but can be turned off by the
284user by setting a run-time switch.
285<p>
286
287In the table below, running time is presented in seconds
288for each of the 4 methods:
289<p>
290<ul>
291<li> Z-: Zassenhaus without power hack.
292<li> Z+: Zassenhaus with power hack.
293<li> H-: van Hoeij without power hack.
294<li> H+: van Hoeij with power hack (NTL's default strategy).
295
296</ul>
297<p>
298
299We also state for each polynomial the number <var>r</var> of modular
300factors of the polynomial (this is not necessarily relevant for the
301power-hack running times),
302
303<p>
304For comparison, we also state some running times for Maple version 5.1,
305running on a 500MHz Alpha 21264/EV6,
306as reported by Paul Zimmermann.
307
308<p>
309
310<table border=1>
311
312<tr align=right> 
313   <td align=left> &nbsp; <td> <var>r</var> <td> Z- <td> Z+ <td> H- <td> H+  <td> maple
314
315<tr align=right>
316   <td align=left> P1  <td> 60 <td> 2.8 <td> 0.3 <td> 2.8 <td> 0.3 <td> 0.6
317
318<tr align=right>
319   <td align=left> P2  <td> 20 <td> 4.3 <td> 1.5 <td> 4.3 <td> 1.5 <td> 1.8
320
321<tr align=right>
322   <td align=left> P3  <td> 28 <td> 13.4 <td> 3.0 <td> 13.4 <td> 3.0 <td> 2.4
323
324<tr align=right>
325   <td align=left> P4  <td> 42 <td> 40.2 <td> 40.2 <td> 27.2 <td> 27.2 <td> &gt; 5000.0
326
327<tr align=right>
328   <td align=left> P5  <td> 32 <td> 39.2 <td> 37.5 <td> 0.5 <td> 0.5 <td> &gt; 5000.0
329
330<tr align=right>
331   <td align=left> P6  <td> 48 <td> 24.0 <td> 0.9 <td> 2.9 <td> 1.0 <td> 48.0
332
333<tr align=right>
334   <td align=left> P7  <td> 76 <td> &nbsp; <td> &nbsp; <td> 15.5 <td> 13.2 <td> &nbsp;
335
336<tr align=right>
337   <td align=left> P8  <td> 54 <td> 3930.0 <td> 3940.0 <td> 44.3 <td> 52.9 <td> &gt; 5000.0
338
339<tr align=right>
340   <td align=left> P4*rev(P4) <td> 84  <td> &nbsp;  <td> &nbsp;  <td> 826.0 <td> 826.0  <td> &nbsp; 
341
342<tr align=right>
343   <td align=left> H1 <td> 131  <td> &nbsp;  <td> &nbsp;  <td> 108.0 <td> 108.0  <td> &nbsp; 
344
345<tr align=right>
346   <td align=left> H2 <td> 256  <td> &nbsp;  <td> &nbsp;  <td> &nbsp;  <td> 379.0  <td> &nbsp; 
347
348<tr align=right>
349   <td align=left> S7 <td> 64  <td> &nbsp;  <td> &nbsp;  <td> 3.4 <td> 3.8  <td> &nbsp; 
350
351<tr align=right>
352   <td align=left> S8 <td> 128  <td> &nbsp;  <td> &nbsp;  <td> 53.4 <td> 64.6  <td> &nbsp; 
353
354<tr align=right>
355   <td align=left> S9 <td> 256  <td> &nbsp;  <td> &nbsp;  <td> 1200.0 <td> 1360.0  <td> &nbsp; 
356
357<tr align=right>
358   <td align=left> S10 <td> 512  <td> &nbsp;  <td> &nbsp;  <td> 31300.0 <td> 31700.0  <td> &nbsp; 
359
360<tr align=right>
361   <td align=left> S6*S7 <td> 96  <td> &nbsp;  <td> &nbsp;  <td> 18.8 <td> 7.2  <td> &nbsp; 
362
363<tr align=right>
364   <td align=left> S7*S9 <td> 320  <td> &nbsp;  <td> &nbsp;  <td> 3550.0 <td> 3630.0  <td> &nbsp; 
365
366<tr align=right>
367   <td align=left> S8*S9 <td> 384  <td> &nbsp;  <td> &nbsp;  <td> 8410.0 <td> 8600.0  <td> &nbsp; 
368
369<tr align=right>
370   <td align=left> M12_5 <td> 72  <td> &nbsp;  <td> &nbsp;  <td> 129.0 <td> 129.0  <td> &nbsp; 
371
372<tr align=right>
373   <td align=left> M12_6 <td> 84  <td> &nbsp;  <td> &nbsp;  <td> 410.0 <td> 396.0  <td> &nbsp; 
374
375</table>
376
377<p>
378Some remarks:
379<p>
380<ul>
381<li>
382A blank entry indicates that the corresponding experiment
383was not performed.
384<li>
385For several of the Maple experiments, the experiment was terminated
386after 5000 seconds, as indicated in the table.
387<li>
388It would be futile to attempt to factor any of the other polynomials
389in the table using the Zassenhaus method, as the running time
390of that method is exponential in <var>r</var>.
391<li>
392We attempted to factor H2 using the H- algorithm, but after several
393hours of computation, the algorithm had not yet terminated.
394Using the H+ algorithm, things were much easier:  the hardest part
395was to prove the irreducibility of a degree 2048 polynomial,
396for which the number of modular factors was 128.
397<li>
398Note that the power hack strategy can slow things down a bit,
399but can also speed things up, sometimes dramatically.
400<li>
401In the above table, rev(f) denotes the "reverse" of the polynomial f,
402i.e., x^{deg(f)}f(x^{-1}).
403</ul>
404
405
406<p>
407
408
409<center>
410<a href="tour-gf2x.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a>
411 <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> 
412<a href="tour-roadmap.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a>
413</center>
414
415
416</body>
417</html>
Note: See TracBrowser for help on using the repository browser.