Opened 11 years ago
Closed 11 years ago
#463 closed bug (fixed)
Wrong memory allocations in different places
Reported by: | Simon King | Owned by: | somebody |
---|---|---|---|
Priority: | critical | Milestone: | 3-1-6 and higher |
Component: | omalloc | Version: | 3-1-5 |
Keywords: | memory corruption | Cc: | nbruin@… |
Description (last modified by )
In Sage trac ticket #13731, Nils Bruin created a Singular spkg in which omalloc is replaced by malloc. This is in order to make it more easy to detect memory corruptions in libsingular, that become immanent as soon as a proper garbage collection is implemented in Sage.
Nils' spkg does not result in a working Singular executable, but libsingular seems to work, and we were able to analyse one crash. In Sage, it is
$ export MALLOC_CHECK_ 3 $ sage ... sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:-x*y}) sage: x*y *** glibc detected *** /usr/local/sage/5.5b1/local/bin/python: free():
With valgrind, we located one problem in gring.cc, line 503:
p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));
If we understood correctly, ExpSize/sizeof(long)
should be at most (rN+1)*sizeof(int)
, but the definition is
int ExpSize=(((rN+1)*sizeof(int)+sizeof(long)-1)/sizeof(long))*sizeof(long);
If sizeof(int)=4
and sizeof(long)=8
then for rN=2
we get (rN+1)*sizeof(int)=12
and ExpSize=16
(which, by the valgrind report, is probably exactly the case we're in). So p_MemAdd_LengthGeneral
is definitely writing out of bounds.
Change History (18)
comment:1 Changed 11 years ago by
comment:2 follow-up: 3 Changed 11 years ago by
Main problem is the use of p_MemAdd_LengthGeneral which is an (internal) Macro to handle monomials, not general int arrays. It requires its arguments having a length divisible by sizeof(long), while F and G have size (rN+1)*sizeof(int). Substituting all p_MemAdd_LengthGeneral(F,G,ExpSize?/sizeof(long)) calls in gring.cc by { for(int ii=rN;ii>0;ii--) F[i]+=G[i]; } should solve this.
comment:3 follow-up: 4 Changed 11 years ago by
Replying to hannes:
Main problem is the use of p_MemAdd_LengthGeneral which is an (internal) Macro to handle monomials, not general int arrays. It requires its arguments having a length divisible by sizeof(long), while F and G have size (rN+1)*sizeof(int). Substituting all p_MemAdd_LengthGeneral(F,G,ExpSize?/sizeof(long)) calls in gring.cc by { for(int ii=rN;ii>0;ii--) F[i]+=G[i]; } should solve this.
Q1: Would that involve a regression in speed?
Q2: Wouldn't it be more logical to change p_MemAdd_LengthGeneral
so that the sizes only need to be divisible by sizeof(int)
?
comment:4 follow-up: 5 Changed 11 years ago by
Replying to SimonKing:
Replying to hannes:
Main problem is the use of p_MemAdd_LengthGeneral which is an (internal) Macro to handle monomials, not general int arrays. It requires its arguments having a length divisible by sizeof(long), while F and G have size (rN+1)*sizeof(int). Substituting all p_MemAdd_LengthGeneral(F,G,ExpSize?/sizeof(long)) calls in gring.cc by { for(int ii=rN;ii>0;ii--) F[i]+=G[i]; } should solve this.
Q1: Would that involve a regression in speed?
yes, but only a very little: instead of (rN+1)/2 additions of long one has rN additions of int. (And rN is a small number)
Q2: Wouldn't it be more logical to change
p_MemAdd_LengthGeneral
so that the lengths of the arguments only need to be divisible bysizeof(int)
?
No, as the purpose ofp_MemAdd_LengthGeneral is the (multiplication) of monomials, which are arrays of long.
comment:5 Changed 11 years ago by
Replying to hannes:
Q1: Would that involve a regression in speed?
yes, but only a very little: instead of (rN+1)/2 additions of long one has rN additions of int.
Or perhaps rN/2
additions of long and at most one additions of int?
comment:6 Changed 11 years ago by
Next question: What about the problem mentioned in comment:1? Should that be on a different ticket, perhaps?
comment:7 Changed 11 years ago by
Nils has found that changeset 15435 addresses the problems from the original ticket description and from comment:1.
Next problem (just quoting what Nils has found): Valgrind complains
==30816== Conditional jump or move depends on uninitialised value(s) ==30816== at 0x26942A01: hGetmem(int, int**, monrec*)
Nils thinks this one is benign, because rewriting the code to the functionally equivalent
hutil.cc:1020
:
scfmon hGetmem(int lm, scfmon old, monp monmem) { scfmon x = monmem->mo; int lx = monmem->a; if ((x==NULL) || (lm > lx)) { - if ((x!=NULL)&&(lx>0)) omFreeSize((ADDRESS)x, lx * sizeof(scmon)); + if (x!=NULL) if (lx>0) omFreeSize((ADDRESS)x, lx * sizeof(scmon)); monmem->mo = x = (scfmon)omAlloc(lm * sizeof(scmon)); monmem->a = lm; } memcpy(x, old, lm * sizeof(scmon)); return x; }
makes the reports go away. Perhaps gcc optimizes out the shortcutting because it's not a benefit here and valgrind doesn't see through the fact that the undefined value doesn't influence the outcome.
comment:8 follow-up: 12 Changed 11 years ago by
Description: | modified (diff) |
---|---|
Summary: | Wrong memory allocation in gring.cc → Wrong memory allocations in different places |
Next one:
#0 0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256 #1 0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710
This seems to be a genuine out-of-bounds reference:
facFactorize.cc:688
CFList* bufAeval2= new CFList [A.level() - 2]; ... evaluationWRTDifferentSecondVars (bufAeval2, bufEvaluation, A); for (int j= 0; j < A.level() - 1; j++) { if (!bufAeval2[j].isEmpty()) counter++; }
so this queries all elements bufAeval2[0],...,bufAeval2[A.level()-2]
. However, if this allocates an array as it does in C, then the new command above only creates bufAeval2[0],...,bufAeval2[A.level()-3]
(i.e., A.level()-2
of them, but 0-based).
The initialization by evaluationWRTDifferentSecondVars
seems to corroborate that:
facFqFactorize.cc:1778
void evaluationWRTDifferentSecondVars (CFList*& Aeval, const CFList& evaluation, const CanonicalForm& A) { CanonicalForm tmp; CFList tmp2; CFListIterator iter; for (int i= A.level(); i > 2; i--) { ... if (preserveDegree) Aeval [i - 3]= tmp2; else Aeval [i - 3]= CFList(); } }
So only Aeval[0], ..., Aeval[A.level()-3]
get initialized.
Thus, the reference to bufAeval2[j].isEmpty() with j = A.level() - 2
indeed seems out of bounds.
comment:9 Changed 11 years ago by
Two oddities concerning the function malloc_usable_size
:
- Why is it used? It is not available on OSX, where one should use
malloc_size
instead. src/omalloc/dlmalloc.h:
does#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)
. Why isOM_MALLOC_SIZEOF_ADDR
not used everywhere to replace direct calls tomalloc_usable_size
? And why does the definition ofOM_MALLOC_SIZEOF_ADDR
not depend on the system?
comment:10 Changed 11 years ago by
In omalloc.c
, I see
#if defined(sgi) void* memalign(size_t size_1, size_t size_2) #elif (defined(__sun) && (defined(__sparc) || defined(__i386) || defined(__x86_64)) || defined(__CYGWIN__)) void* memalign(size_t size_1, size_t size_2) #else void* memalign(void* alignment, size_t size) #endif
Is the last declaration really what we want? In my /usr/include/malloc.h
, I find
/* Allocate SIZE bytes allocated to ALIGNMENT bytes. */ extern void *memalign (size_t __alignment, size_t __size) __THROW __attribute_malloc__ __wur;
So, shouldn't void* alignment
be replaced with size_t alignment
?
And speaking about memalign: A grep for memalign in the Singular sources gave me the impression that it is actually not used, unless dlmalloc is used. Is my impression correct?
Since memalign does not seem portable and (worse) does not allow to free allocated memory on some systems, I'd rather hope that Singular doesn't use it.
comment:11 Changed 11 years ago by
PS: omMemalignFunc
declared in omalloc/omAllocFunc.h
has a similar flaw, the first argument should be size_t
, not void *
(at least on openSuse
...).
comment:12 follow-up: 13 Changed 11 years ago by
Hi Simon, which Singular version did you use for that? I have changed the lines in question with baadc0f7 to
for (int j= 0; j < lengthAeval2; j++) { if (!bufAeval2[j].isEmpty()) counter++; }
Cheers, Martin
Replying to SimonKing:
Next one:
#0 0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256 #1 0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710This seems to be a genuine out-of-bounds reference:
facFactorize.cc:688
CFList* bufAeval2= new CFList [A.level() - 2]; ... evaluationWRTDifferentSecondVars (bufAeval2, bufEvaluation, A); for (int j= 0; j < A.level() - 1; j++) { if (!bufAeval2[j].isEmpty()) counter++; }so this queries all elements
bufAeval2[0],...,bufAeval2[A.level()-2]
. However, if this allocates an array as it does in C, then the new command above only createsbufAeval2[0],...,bufAeval2[A.level()-3]
(i.e.,A.level()-2
of them, but 0-based).The initialization by
evaluationWRTDifferentSecondVars
seems to corroborate that:
facFqFactorize.cc:1778
void evaluationWRTDifferentSecondVars (CFList*& Aeval, const CFList& evaluation, const CanonicalForm& A) { CanonicalForm tmp; CFList tmp2; CFListIterator iter; for (int i= A.level(); i > 2; i--) { ... if (preserveDegree) Aeval [i - 3]= tmp2; else Aeval [i - 3]= CFList(); } }So only
Aeval[0], ..., Aeval[A.level()-3]
get initialized.Thus, the reference to
bufAeval2[j].isEmpty() with j = A.level() - 2
indeed seems out of bounds.
comment:13 follow-up: 14 Changed 11 years ago by
Hi Martin!
Replying to mlee:
Hi Simon, which Singular version did you use for that? I have changed the lines in question with baadc0f7 to
for (int j= 0; j < lengthAeval2; j++) { if (!bufAeval2[j].isEmpty()) counter++; }
I'm using the Singular version that we have in Sage: 3-1-5.
What is the changeset? We could backport the fix into our Singular-3-1-5 spkg.
comment:14 follow-up: 15 Changed 11 years ago by
Hi Simon use 9ece45a2 for all the changes I did. Strangely trac says 'No changeset ... ' but simply search for it once your logged in or search for it in our github repository.
Replying to SimonKing:
Hi Martin!
Replying to mlee:
Hi Simon, which Singular version did you use for that? I have changed the lines in question with baadc0f7 to
for (int j= 0; j < lengthAeval2; j++) { if (!bufAeval2[j].isEmpty()) counter++; }I'm using the Singular version that we have in Sage: 3-1-5.
What is the changeset? We could backport the fix into our Singular-3-1-5 spkg.
comment:15 follow-up: 16 Changed 11 years ago by
Replying to mlee:
Hi Simon use 9ece45a2 for all the changes I did.
How to find it? When I am in https://github.com/Singular/Sources/commits/spielwiese then 9ece45a2 is not mentioned (so, apparently it is an older commit), and moreover a search for the string "search" had no result. Hence, I have no idea how I could search https://github.com/Singular/Sources/commits/spielwiese.
comment:16 follow-up: 17 Changed 11 years ago by
Hi Simon, the commit is from Sep, 7th https://github.com/Singular/Sources/commits/master?page=5.
Replying to SimonKing:
Replying to mlee:
Hi Simon use 9ece45a2 for all the changes I did.
How to find it? When I am in https://github.com/Singular/Sources/commits/spielwiese then 9ece45a2 is not mentioned (so, apparently it is an older commit), and moreover a search for the string "search" had no result. Hence, I have no idea how I could search https://github.com/Singular/Sources/commits/spielwiese.
comment:17 Changed 11 years ago by
Replying to mlee:
Hi Simon, the commit is from Sep, 7th https://github.com/Singular/Sources/commits/master?page=5.
Thank you! So, it's a rather big commit. I guess I'll just backport the code snipped that you mentioned in comment:12.
comment:18 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
resulted in several fixes, which are already discussed above and will be include in the release 3-1-6 (to be released within the next days).
Copying Nils' comment from the Sage ticket:
This is like shooting fish in a barrel.
ring.cc:157:
calling ring.cc:127 (the variable order above gets passed as the parameter ord in the code below):
calling ring.cc:3469:
calling ring.cc:2065:
As you see, that last line
...r->ord[2]
is indeed out of bound on a2*sizeof(int)
block. The valgrind session seems to indicate this code path is indeed exercised.