diff --git a/README.md b/README.md new file mode 100644 index 0000000..4c05238 --- /dev/null +++ b/README.md @@ -0,0 +1,88 @@ +# Vagrant Slurm + +**Warning: For demonstration/testing purposes only, not suitable for use in production** + +This repository contains a `Vagrantfile` and the necessary configuration for +automating the setup of a Slurm cluster using Vagrant's shell provisioning on +Debian 12 x86_64 VMs. + +### Prerequisites + +This setup was developed using vagrant-libvirt with NFS for file sharing, +rather than the more common VirtualBox configuration which typically uses +VirtualBox's Shared Folders. However, VirtualBox should work fine. + +The core requirements for this setup are: +- Vagrant (with functioning file sharing) +- (Optional) Make (for convenience commands) + +### Cluster Structure +- `node1`: Head Node (runs `slurmctld`) +- `node2`: Login/Submit Node +- `node3` / `node4`: Compute Nodes (runs `slurmd`) + +By default, each node is allocated: +- 2 threads/cores (depending on architecture) +- 2 GB of RAM + +**Warning: 8 vCPUs and 8 GB of RAM is used in total resources** + +## Quick Start + +1. To build the cluster, you can use either of these methods + + Using the Makefile (recommended): + + make + + Using Vagrant directly: + + vagrant up + +2. Login to the Login Node (node2) as the submit user: + + vagrant ssh node2 -c "sudo -iu submit" + + +3. Run the example prime number search script: + + /vagrant/primes.sh + + By default, this script searches for prime numbers from `1-10,000` and `10,001-20,000` + + You can adjust the range searched per node by providing an integer argument, e.g.: + + /vagrant/primes.sh 20000 + + The script will then drop you into a `watch -n0.1 squeue` view so you can see + each job computing on `nodes[3-4]`. You may `CTRL+c` out of this view, and + the jobs will continue in the background. The home directory for the `submit` + user is in the shared `/vagrant` directory, so the results from each node are + shared back to the login node. + +4. View the resulting prime numbers found, check `ls` for exact filenames + + less slurm-1.out + less slurm-2.out + +### Configuration Tool + +On the Head Node (`node1`), you can access the configuration tools specific to +the version distributed with Debian. Since this may not be the latest Slurm +release, it's important to use the configuration tool that matches the +installed version. To access these tools, you can use Python to run a simple +web server: + + python3 -m http.server 8080 --directory /usr/share/doc/slurm-wlm/html/ + +You can then access the HTML documentation via the VM's IP address at port 8080 +in your web browser on the host machine. + +### Cleanup +To clean up files placed on the host through Vagrant file sharing: + + make clean + +This command is useful when you want to remove all generated files and return +to a clean state. The Makefile is quite simple, so you can refer to it directly +to see exactly what's being cleaned up. diff --git a/primes.sh b/primes.sh new file mode 100755 index 0000000..588b639 --- /dev/null +++ b/primes.sh @@ -0,0 +1,41 @@ +#!/bin/bash + +# This script finds prime numbers using the Slurm workload manager. +# It operates in two modes: +# +# 1. When run without a Slurm job ID: +# - It accepts an optional argument to set the range (default: 10000). +# - It submits two Slurm jobs: +# a) First job searches for primes from 1 to RANGE. +# b) Second job searches for primes from (RANGE + 1) to (RANGE * 2). +# - After submission, it watches the Slurm queue, updating every 0.1 seconds. +# +# 2. When run as a Slurm job: +# - It uses the 'factor' command to identify prime numbers within the given range. +# - It prints each prime number found to stdout. +# - It logs the job ID and the range being searched. +# +# Usage: +# Without arguments: ./primes.sh +# With custom range: ./primes.sh 20000 + +RANGE=${1:-10000} + +function find_primes() { + local START="$1" + local END="$2" + echo "INFO: Job $SLURM_JOB_ID looking for prime numbers from $START to $END" + for ((i=START;i<=END;i++)); do + if [ "$(factor "$i")" == "$i: $i" ]; then + echo "$i" + fi + done +} + +if [ -z "$SLURM_JOB_ID" ]; then + sbatch -N1 --wrap="$0 1 $RANGE" + sbatch -N1 --wrap="$0 $((RANGE + 1)) $((RANGE * 2))" + watch -n0.1 squeue +else + find_primes "$1" "$2" +fi