Memory allocation errors

asked by River (2025/06/12 22:18)

Hello,

I've been running some fairly large models on a compute cluster, and after a few crashes due to running out of memory, I tried implementing the suggested memory management command “ulimit -v <number of KB RAM>”. This stopped the program from crashing outright / stopped the scheduler from killing my task, but now I am getting more errors during runtime.

Specifically, one error caught my eye:

Outer loop   4, Number of Determinants:   1433409  26543058 last variance  7.412405896600516E+02
alloc failed WaveFunctionInitCopyBasis 02 Im
to do BlockLanczosGroundStateConserveBasisKrylovRecalculate
Cheap fix needs to be improved

as it specifically says it's a cheap fix and needs to be improved. Does anyone know if anything better has been developed?

Or is there a better way to limit the amount of RAM Quanty attempts to use? On my local machine it would fill ram and then periodically write to disk when it needed more space, I'm not sure why it fails to do this in a server setting.

See below the raw output for more error messages.

Code Output:

Lmod is automatically replacing "gcc/12.3" with "intel/2023.2.1".

Lmod Warning:
-------------------------------------------------------------------------------
The following dependent module(s) are not currently loaded: gcccore/.12.3
(required by: intel/2023.2.1)
-------------------------------------------------------------------------------




Due to MODULEPATH changes, the following have been reloaded:
  1) flexiblas/3.3.1     2) openmpi/4.1.5

=============================================================
====    written by Maurits W. Haverkort                  ====
====    with contributions from:                         ====
====    Yi Lu, Robert Green, Sebastian Macke             ====
====    Marius Retegan, Martin Brass, and Simon Heinze   ====
====    (C) 1995-2018   All rights reserved              ====
====    www.quanty.org                                   ====
====    Beta version, be critical and report errors!!!   ====
=============================================================
====    Version 0.6 Autumn 2018                          ====
====            compiled at: Nov 25 2018 at 23:37:47     ====
=============================================================
====    When used in scientific publications please cite ====
====    one of the following papers as appropriate with  ====
====    respect to the methods used in your publication: ====
====    Phys. Rev. B 85, 165113 (2012)                   ====
====    Phys. Rev. B 90, 085102 (2014)                   ====
====    Euro Phys. Lett. 108, 57004 (2014)               ====
====    J. of Phys.: Conf. Series 712, 012001 (2016)     ====
=============================================================
Program executed on: Thu Jun 12 12:29:01 2025
Running on host    : platocpu010
number of available processors              : 40
maximum number of threads in parallel region: 40
Smallest positive float  : 2.225074E-308 
Smallest deviation from 1: 2.220446E-16 

Start of BlockGroundState. Converge 8 states to an energy with relative variance smaller than  1.490116119384766E-06

Start of BlockOperatorPsiSerialRestricted
Outer loop   1, Number of Determinants:        45        45 last variance  2.190014106090412E+00
Start of BlockOperatorPsiSerialRestricted
Start of BlockGroundState. Converge 8 states to an energy with relative variance smaller than  1.490116119384766E-06

Start of BlockOperatorPsiSerial
Outer loop   1, Number of Determinants:        45      2021 last variance  5.754242953567713E+00
  Restart loop 1 with a Krylov basis of 108 and a full basis of 2021
Start of BlockOperatorPsiSerial
Outer loop   2, Number of Determinants:      2021     63239 last variance  1.090220143151499E+02
  Restart loop 1 with a Krylov basis of 108 and a full basis of 63239
Start of BlockOperatorPsiSerial
Outer loop   3, Number of Determinants:     63239   1433409 last variance  2.797107634518841E+02
  Restart loop 1 with a Krylov basis of 108 and a full basis of 1433409
Start of BlockOperatorPsiSerial
Outer loop   4, Number of Determinants:   1433409  26543058 last variance  7.412405896600516E+02
alloc failed WaveFunctionInitCopyBasis 02 Im
to do BlockLanczosGroundStateConserveBasisKrylovRecalculate
Cheap fix needs to be improved
  Restart loop 1 with a Krylov basis of 24 and a full basis of 26543058
alloc failed WaveFunctionInitCopyBasis 02 Im
  Restart loop 2 with a Krylov basis of 24 and a full basis of 26543058
alloc failed WaveFunctionInitCopyBasis 02 Im
Start of BlockOperatorPsiSerial
alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01OperatorPsi failed in BlockOperatorPsiSerial
Start of BlockOperatorPsiSerial
alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
 01alloc failed RealWaveFunctionAddElement 11 Re
ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush
ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC