Memory allocation errors
asked by River (2025/06/12 22:18)
Hello,
I've been running some fairly large models on a compute cluster, and after a few crashes due to running out of memory, I tried implementing the suggested memory management command “ulimit -v <number of KB RAM>”. This stopped the program from crashing outright / stopped the scheduler from killing my task, but now I am getting more errors during runtime.
Specifically, one error caught my eye:
Outer loop 4, Number of Determinants: 1433409 26543058 last variance 7.412405896600516E+02 alloc failed WaveFunctionInitCopyBasis 02 Im to do BlockLanczosGroundStateConserveBasisKrylovRecalculate Cheap fix needs to be improved
as it specifically says it's a cheap fix and needs to be improved. Does anyone know if anything better has been developed?
Or is there a better way to limit the amount of RAM Quanty attempts to use? On my local machine it would fill ram and then periodically write to disk when it needed more space, I'm not sure why it fails to do this in a server setting.
See below the raw output for more error messages.
Code Output:
Lmod is automatically replacing "gcc/12.3" with "intel/2023.2.1". Lmod Warning: ------------------------------------------------------------------------------- The following dependent module(s) are not currently loaded: gcccore/.12.3 (required by: intel/2023.2.1) ------------------------------------------------------------------------------- Due to MODULEPATH changes, the following have been reloaded: 1) flexiblas/3.3.1 2) openmpi/4.1.5 ============================================================= ==== written by Maurits W. Haverkort ==== ==== with contributions from: ==== ==== Yi Lu, Robert Green, Sebastian Macke ==== ==== Marius Retegan, Martin Brass, and Simon Heinze ==== ==== (C) 1995-2018 All rights reserved ==== ==== www.quanty.org ==== ==== Beta version, be critical and report errors!!! ==== ============================================================= ==== Version 0.6 Autumn 2018 ==== ==== compiled at: Nov 25 2018 at 23:37:47 ==== ============================================================= ==== When used in scientific publications please cite ==== ==== one of the following papers as appropriate with ==== ==== respect to the methods used in your publication: ==== ==== Phys. Rev. B 85, 165113 (2012) ==== ==== Phys. Rev. B 90, 085102 (2014) ==== ==== Euro Phys. Lett. 108, 57004 (2014) ==== ==== J. of Phys.: Conf. Series 712, 012001 (2016) ==== ============================================================= Program executed on: Thu Jun 12 12:29:01 2025 Running on host : platocpu010 number of available processors : 40 maximum number of threads in parallel region: 40 Smallest positive float : 2.225074E-308 Smallest deviation from 1: 2.220446E-16 Start of BlockGroundState. Converge 8 states to an energy with relative variance smaller than 1.490116119384766E-06 Start of BlockOperatorPsiSerialRestricted Outer loop 1, Number of Determinants: 45 45 last variance 2.190014106090412E+00 Start of BlockOperatorPsiSerialRestricted Start of BlockGroundState. Converge 8 states to an energy with relative variance smaller than 1.490116119384766E-06 Start of BlockOperatorPsiSerial Outer loop 1, Number of Determinants: 45 2021 last variance 5.754242953567713E+00 Restart loop 1 with a Krylov basis of 108 and a full basis of 2021 Start of BlockOperatorPsiSerial Outer loop 2, Number of Determinants: 2021 63239 last variance 1.090220143151499E+02 Restart loop 1 with a Krylov basis of 108 and a full basis of 63239 Start of BlockOperatorPsiSerial Outer loop 3, Number of Determinants: 63239 1433409 last variance 2.797107634518841E+02 Restart loop 1 with a Krylov basis of 108 and a full basis of 1433409 Start of BlockOperatorPsiSerial Outer loop 4, Number of Determinants: 1433409 26543058 last variance 7.412405896600516E+02 alloc failed WaveFunctionInitCopyBasis 02 Im to do BlockLanczosGroundStateConserveBasisKrylovRecalculate Cheap fix needs to be improved Restart loop 1 with a Krylov basis of 24 and a full basis of 26543058 alloc failed WaveFunctionInitCopyBasis 02 Im Restart loop 2 with a Krylov basis of 24 and a full basis of 26543058 alloc failed WaveFunctionInitCopyBasis 02 Im Start of BlockOperatorPsiSerial alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01OperatorPsi failed in BlockOperatorPsiSerial Start of BlockOperatorPsiSerial alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC
Answers
Dear River,
We have several routines in Quanty that can calculate spectra and or ground-states. In several cases one can tread between memory usage and speed. Whenever I need more memory I use the C function alloc or calloc and look if this call succeeds. If not we try to continue with an algorithm that is less memory hungry.
On modern machines alloc and calloc almost never fails unless you specifically tell the machine a limit. This is what the command “ulimit -v <number of KB RAM>” does for you. Modern machines assume that they will be able to give you the memory at the moment you actually need it. If the allocation did not fail and you do not have the hardware memory the code will crash (sometimes hard). On your local machine you are probably using your hard-disk as additional memory.
The error message that you see indicates that the slower routine we switch to when we run out of memory can be optimised and I also know (knew) how this can be done (you probably find a hint on how to do this in the source code) but at the same time I have not found the time to do the optimisation. I have a list of optimisations I want to make, but are limited by time at the moment.
For now I see 4 options to continue
Best wishes, Maurits
Hello Maurits,
Thank you for the reply!
I am not very familiar with C programming so thank you for clarifying. As long as these errors do not make the results incorrect then I think my best option is to continue with the ulimit command. I have also implemented some Restrictions, the documentation on this site is not amazing for this, so I didn't realize how they worked earlier. With a few other optimizations I believe things are working smoother now.
Thanks again, River