I like my job, so no screenshots. Sorry.
Notes:
- sbatch is a command for submitting jobs on high performance compute nodes
- the
huge-n128-512gnode uses 128 cores and has 512GiB of memory - This is occurring in a medical research nonprofit
User: Hello everyone, this is the first time I’m using GCP. I’m trying to run a job, but it keeps failing. These are the sbatch headers I’m using:
#SBATCH --partition=huge-n128-512g
#SBATCH --nodes=8
#SBATCH [email protected]
#SBATCH --mail-type=FAIL
#SBATCH --mem-per-cpu=32G
IT: Please make sure you need to use that node, each one costs $4500/month to use. Can you describe the job you’re trying to do?
User: I’m doing high-depth genetic sequencing using 3gb bam files.
(additional note: there’s usually only 1 bam file per chromosome, so 69gb total. Nice.)
IT: Those bam files are pretty small. I’d recommend starting with the
med-n16-64gnode and moving up if needed. We’re only billed for run time. If the jobs take the same amount of time, it would be 13% of the cost.
The astute among you will notice that an 8 node swarm of 32GiB of memory per core is 32TiB total. The job was failing because the --mem-per-cpu flag was going above the available memory on each node. Even without that flag, the swarm would have used 4TiB memory. Holy overallocation, Batman!
Was this used as an eye opener event to maybe put some restrictions in place for resource allocation? I don’t see any real need of a newbie needing access to the ability to even do that level of allocation.
Like this time it was only caught cause the nodes physically didn’t have the resources needed, Yall got lucky that time, next time it might not be the case.
Granted It seems it wouldn’t have been a $36,000 job since you mentioned it was based off run time but, still wasteful of resources and cost prohibitive.
Also memory per core is an interesting way to allocate that. Total? Absolutely. Per node? Yeah. But per CPU?
Lol, unfortunately not. All nodes are still freely accessible
I worked in medical research not too long ago as an ETL guy using Netezza for holding data and making cohorts. I don’t envy having to also spin up / down compute based nodes these days. Good luck out there.
(A job that probably would run ok on a MacBook Pro)
Some people don’t math so good.
In CFD, that wouldn’t even give you 1-litre of physics-correct simulation, ttbomk…
( I read in 1 paper that you need 11-micron cells in your mesh, for physics-correctness: bigger didn’t work right. & there are one HELL of alot of 11-micron cells in an aircraft’s boundary-layer.
Which explains why airliner-simulation-runs can be priced in the $0.1B+ range, from what I’ve read… )
_ /\ _



