I've never seen this problem. Possibly your MPI is not configured
correctly? Can you run smaller problems fine on 256 procs? It is
also odd that it reporst on processes and threads which don't equal
the # of MPI processes (128 or 256). Are you certain that each proc
has its own local memory of sufficient size? E.g. you are not running
in virtual proc mode?