====== Memory allocation errors ====== ;;# asked by [[mailto:alk365@usask.ca|River]] (2025/06/12 22:18) ;;# == == Hello, I've been running some fairly large models on a compute cluster, and after a few crashes due to running out of memory, I tried implementing the suggested memory management command "ulimit -v ". This stopped the program from crashing outright / stopped the scheduler from killing my task, but now I am getting more errors during runtime. Specifically, one error caught my eye: Outer loop 4, Number of Determinants: 1433409 26543058 last variance 7.412405896600516E+02 alloc failed WaveFunctionInitCopyBasis 02 Im to do BlockLanczosGroundStateConserveBasisKrylovRecalculate Cheap fix needs to be improved as it specifically says it's a cheap fix and needs to be improved. Does anyone know if anything better has been developed? Or is there a better way to limit the amount of RAM Quanty attempts to use? On my local machine it would fill ram and then periodically write to disk when it needed more space, I'm not sure why it fails to do this in a server setting. See below the raw output for more error messages. Code Output: Lmod is automatically replacing "gcc/12.3" with "intel/2023.2.1". Lmod Warning: ------------------------------------------------------------------------------- The following dependent module(s) are not currently loaded: gcccore/.12.3 (required by: intel/2023.2.1) ------------------------------------------------------------------------------- Due to MODULEPATH changes, the following have been reloaded: 1) flexiblas/3.3.1 2) openmpi/4.1.5 ============================================================= ==== written by Maurits W. Haverkort ==== ==== with contributions from: ==== ==== Yi Lu, Robert Green, Sebastian Macke ==== ==== Marius Retegan, Martin Brass, and Simon Heinze ==== ==== (C) 1995-2018 All rights reserved ==== ==== www.quanty.org ==== ==== Beta version, be critical and report errors!!! ==== ============================================================= ==== Version 0.6 Autumn 2018 ==== ==== compiled at: Nov 25 2018 at 23:37:47 ==== ============================================================= ==== When used in scientific publications please cite ==== ==== one of the following papers as appropriate with ==== ==== respect to the methods used in your publication: ==== ==== Phys. Rev. B 85, 165113 (2012) ==== ==== Phys. Rev. B 90, 085102 (2014) ==== ==== Euro Phys. Lett. 108, 57004 (2014) ==== ==== J. of Phys.: Conf. Series 712, 012001 (2016) ==== ============================================================= Program executed on: Thu Jun 12 12:29:01 2025 Running on host : platocpu010 number of available processors : 40 maximum number of threads in parallel region: 40 Smallest positive float : 2.225074E-308 Smallest deviation from 1: 2.220446E-16 Start of BlockGroundState. Converge 8 states to an energy with relative variance smaller than 1.490116119384766E-06 Start of BlockOperatorPsiSerialRestricted Outer loop 1, Number of Determinants: 45 45 last variance 2.190014106090412E+00 Start of BlockOperatorPsiSerialRestricted Start of BlockGroundState. Converge 8 states to an energy with relative variance smaller than 1.490116119384766E-06 Start of BlockOperatorPsiSerial Outer loop 1, Number of Determinants: 45 2021 last variance 5.754242953567713E+00 Restart loop 1 with a Krylov basis of 108 and a full basis of 2021 Start of BlockOperatorPsiSerial Outer loop 2, Number of Determinants: 2021 63239 last variance 1.090220143151499E+02 Restart loop 1 with a Krylov basis of 108 and a full basis of 63239 Start of BlockOperatorPsiSerial Outer loop 3, Number of Determinants: 63239 1433409 last variance 2.797107634518841E+02 Restart loop 1 with a Krylov basis of 108 and a full basis of 1433409 Start of BlockOperatorPsiSerial Outer loop 4, Number of Determinants: 1433409 26543058 last variance 7.412405896600516E+02 alloc failed WaveFunctionInitCopyBasis 02 Im to do BlockLanczosGroundStateConserveBasisKrylovRecalculate Cheap fix needs to be improved Restart loop 1 with a Krylov basis of 24 and a full basis of 26543058 alloc failed WaveFunctionInitCopyBasis 02 Im Restart loop 2 with a Krylov basis of 24 and a full basis of 26543058 alloc failed WaveFunctionInitCopyBasis 02 Im Start of BlockOperatorPsiSerial alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01OperatorPsi failed in BlockOperatorPsiSerial Start of BlockOperatorPsiSerial alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC 01alloc failed RealWaveFunctionAddElement 11 Re ComplexWaveFunctionAddElement failed in ComplexWaveFunctionAddElementOMPMiniFlush ComplexWaveFunctionAddElementOMPMiniFlush failed in OperatorPsiMC ~~DISCUSSION|Answers~~