I like my job, so no screenshots. Sorry.
Notes:
- sbatch is a command for submitting jobs on high performance compute nodes
- the
huge-n128-512gnode uses 128 cores and has 512GiB of memory - This is occurring in a medical research nonprofit
User: Hello everyone, this is the first time I’m using GCP. I’m trying to run a job, but it keeps failing. These are the sbatch headers I’m using:
#SBATCH --partition=huge-n128-512g
#SBATCH --nodes=8
#SBATCH [email protected]
#SBATCH --mail-type=FAIL
#SBATCH --mem-per-cpu=32G
IT: Please make sure you need to use that node, each one costs $4500/month to use. Can you describe the job you’re trying to do?
User: I’m doing high-depth genetic sequencing using 3gb bam files.
(additional note: there’s usually only 1 bam file per chromosome, so 69gb total. Nice.)
IT: Those bam files are pretty small. I’d recommend starting with the
med-n16-64gnode and moving up if needed. We’re only billed for run time. If the jobs take the same amount of time, it would be 13% of the cost.
The astute among you will notice that an 8 node swarm of 32GiB of memory per core is 32TiB total. The job was failing because the --mem-per-cpu flag was going above the available memory on each node. Even without that flag, the swarm would have used 4TiB memory. Holy overallocation, Batman!


Also memory per core is an interesting way to allocate that. Total? Absolutely. Per node? Yeah. But per CPU?