I like my job, so no screenshots. Sorry.

Notes:

  • sbatch is a command for submitting jobs on high performance compute nodes
  • the huge-n128-512g node uses 128 cores and has 512GiB of memory
  • This is occurring in a medical research nonprofit

User: Hello everyone, this is the first time I’m using GCP. I’m trying to run a job, but it keeps failing. These are the sbatch headers I’m using:

#SBATCH --partition=huge-n128-512g
#SBATCH --nodes=8
#SBATCH [email protected]
#SBATCH --mail-type=FAIL
#SBATCH --mem-per-cpu=32G

IT: Please make sure you need to use that node, each one costs $4500/month to use. Can you describe the job you’re trying to do?

User: I’m doing high-depth genetic sequencing using 3gb bam files.

(additional note: there’s usually only 1 bam file per chromosome, so 69gb total. Nice.)

IT: Those bam files are pretty small. I’d recommend starting with the med-n16-64g node and moving up if needed. We’re only billed for run time. If the jobs take the same amount of time, it would be 13% of the cost.

The astute among you will notice that an 8 node swarm of 32GiB of memory per core is 32TiB total. The job was failing because the --mem-per-cpu flag was going above the available memory on each node. Even without that flag, the swarm would have used 4TiB memory. Holy overallocation, Batman!

  • Paragone@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    5 days ago

    In CFD, that wouldn’t even give you 1-litre of physics-correct simulation, ttbomk…

    ( I read in 1 paper that you need 11-micron cells in your mesh, for physics-correctness: bigger didn’t work right. & there are one HELL of alot of 11-micron cells in an aircraft’s boundary-layer.

    Which explains why airliner-simulation-runs can be priced in the $0.1B+ range, from what I’ve read… )

    _ /\ _