We employ both internal and external resources to perform scientific calculations.
Our own computing infrastructure consist of:
- Theory Cluster : A heterogeneus cluster with 48 nodes, 116 cores, 232 threads and Infiniband network.
- SOM1 Server : A server with 2 processors, 12 cores, 192 GB RAM, 1 NVIDIA GPU M2090 TESLA and Infiniband card.
- SOM2 Server : A server with 4 processors, 32 cores, 512 GB RAM and Infiniband card.
- SOM3 Server : A server with 4 processors, 32 cores, 512 GB RAM, 24TB DISK and Infiniband card.
- SOM4 Server : A server with 2 processors, 32 cores, 512 GB RAM and 1 high performance I/O disk: 1.4 TB PCIe SSD Micron P420m (Seq. read: 3GB/s).
Theory is a heterogeneous cluster composed of 48 heterogeneus nodes and interconnected throught a dedicated infiniband network.
|Front end (Theory)||Nodes01-30||Nodes31-36||Nodes37-39||Nodes40-48|
|Operating System||Red Hat Enterprise Linux Server release 5.1 (Tikanga)||Red Hat Enterprise Linux Server release 5.1 (Tikanga)||Red Hat Enterprise Linux Server release 5.1 (Tikanga)||Red Hat Enterprise Linux Server release 5.1 (Tikanga)||Red Hat Enterprise Linux Server release 5.1 (Tikanga)|
|Version Kernel||2.6.18-53.el5 x86_64||2.6.18-53.el5 x86_64||2.6.18-53.el5 x86_64||2.6.18-53.el5 x86_64||2.6.18-53.el5 x86_64|
2 x Intel(R) Xeon(R) CPU 5150@ 2.66GHz (2 Cores)
|2 x Intel(R) Xeon(R) CPU 5150@ 2.66GHz (2 Cores)||2 x Intel(R) Xeon(R) CPU 5150@ 2.66GHz (2 Cores)||2 x Intel(R) Xeon(R) CPU E5440 @ 2.83GHz (4 Cores)||2 x Intel(R) Xeon(R) CPU E5440 @ 2.83GHz (4 Cores)|
|Number of Cores:||4||4||4||8||8|
|Memory Size:||4 GB||4 GB||8 GB||8 GB||16 GB|
|Disk Size||2.5 TB||2.5 TB (NFS)||2.5 TB (NFS)||2.5 TB (NFS)||2.5 TB (NFS)|
|Network||Infiniband 20 Gbps||Infiniband 20 Gbps||Infiniband 20 Gbps||Infiniband 20 Gbps||Infiniband 20 Gbps|
Theory is a cluster composed by a frontend node and 48 compute nodes all connected by a infiniband network. The cluster has 240 available cores to launch parallel and sequential jobs. Theory has a queue system that imposes limits to the maximum execution time for a job and the number of cores used.
Below is a summary of the constraints that you should consider before submit a job to the queue system.
|Execution time limit||1 hour||1 week|
|Max queued jobs per user||1000||1000|
|Execution time limit||24 hours||1 week||1 month|
|Max queued jobs per user||1000||1000||1000|
It is very important that you specify the number of cores an the estimated execution time for your job. If you forget to put this then the system uses the minimum resources to run your job (1 hour and 1 core) and the system will kill the job after that time.
Quick Start Instructions
Submit a Job
Whenever you want to launch a serial or parallel job to the queuing system you must use the command: clusterlauncher
It is a simple command that allow you to specify some features about your job. In the next sections explains in detail the full set of options.
Using clusterlauncher command you only need to specify some basic arguments as the number of processes , the estimated time for you execution and the executable name and its parameters.
If you forget to specify the execution estimated time then your job will be associated as a short sequential so you should read the next lines after to submit any job.
[user@frontend]$ clusterlauncher -h
clusterlauncher [-N <JOBNAME>] [-m <MEMSIZE>] [-w <WALLTIME>] [--openmp] [--mpi_mp] [--fast] [--exclusive] -n number of processes [-p processes per node] EXECUTABLE params_executable"
-n : number of required processes. Default is 1."
-p : number of required processes per node."
-m: Estimated memory size that the job needs."
-w: Estimated execution time that your job needs to finish. Example: 01:00:00 (1 hour) or 00:30:00 (30 minutes)."
--openmp: If you want submit only an openmp job. "
--mpi_mp: If you want submit a MPI + openmp hybrid job. You must write the number of MPI tasks in the n option."
--fast: Only launch to the 8 core nodes [node37 to node48])."
--exclusive: Any other job will run in the same node."
I have a sequential job with a 1 hour of estimated execution time:
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 01:00:00 -n 1 ./seqjob 10000 0
If you do not know how much time your job will be running then you can specify a long time (for instance: 48 hours):
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 48:00:00 -n 1 ./seqjob 10000 0
I have a MPI job of 16 processes and an estimated time of 4 days (96 hours)
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 -n 16 ./parjob 10000 0
Theory has 12 nodes with 8 cores by node and 1GB by core. If you want that the system only uses this type of node then you should put the --fast option. Without the --fast option the system submits jobs to any type of node (4 and 8 core node) depending of the load of system.
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 --fast -n 16 ./parjob 10000 0
Also, if you want to any other job run in the same assigned node for your job then you have to specify the --exclusive option:
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 --fast --exclusive -n 16 ./parjob 10000 0
The same for using any type of node:
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 --exclusive -n 16 ./parjob 10000 0
I have a MPI job of 16 processes and each process requires 500 MB of memory.
The -m option permits you to specify how much memory your job requires. You have to specify the size of memory by process and not complete job.
For example: a MPI job with 16 MPI processes and each thread requires 1 GB -> 1*16 = 16 GB job complete.
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 -m 1g -n 16 ./parjob 10000 0
or the same in megas
[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 -m 1000m -n 16 ./parjob 10000 0
The -m option is optional but putting it ensures you that your job will run with enought resources.
When a job has finished the system writes a file with the error and standard output. You can find this file named as *.e<jobid> and *.o<jobid> in the same path where you submit your job.
Monitoring Your Jobs
Show Status, Check and Delete a Job
Use these commands to monitoring and delete your jobs in the queue system:
|qstat||Shows the status of your jobs: R = running, Q= queued|
Shows more detailed information about the status of your job (start time, remaining time, etc.):
[user@frontend colas]$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 5156 user Running 50 1:06:00:00 Thu Feb 21 09:59:47 1 Active Job 50 of 240 Processors Active (20.83%) 8 of 48 Nodes Active (16.67%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 5145 user Idle 108 1:00:00 Thu Feb 21 09:55:53 5148 user Idle 108 1:00:00 Thu Feb 21 09:56:16 5150 user Idle 108 1:00:00 Thu Feb 21 09:56:30
Shows the total available resources to use in this moment.
It tells you how much time remains for the beginning of your job:
[user@frontend colas]$ showstart 5142 job 5142 requires 108 procs for 10:00:00 Earliest start in 10:00:00 on Thu Feb 21 19:51:30 Earliest completion in 20:00:00 on Fri Feb 22 05:51:30
Gives the reasons for why your job is not running:
[user@frontend colas]$ checkjob 5145 checking job 5145 State: Idle Creds: user:usergroup:teo class:short qos:short WallTime: 00:00:00 of 1:00:00 SubmitTime: Thu Feb 21 09:55:53 (Time Queued Total: 00:08:32 Eligible: 00:08:32) Total Tasks: 108 Req TaskCount: 108 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] NodeSet=ONEOF:FEATURE:parallel IWD: [NONE] Executable: [NONE] Bypass: 7 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLE Reservation '5145' (00:00:00 -> 1:00:00 Duration: 1:00:00) PE: 108.00 StartPriority: 1009 job cannot run in partition DEFAULT (idle procs do not meet requirements : 104 of 108 procs found) idle procs: 240 feasible procs: 104 Rejection Reasons: [State : 30]
Deletes a job running or enqueued in the system:
[user@frontend colas]$ qdel 5145
SOM1 is a high-performance computation server of shared memory that has a total of 24 threads and 192 GB of memory.
It is intended to be used in scientific calculations with large memory requirement and computation per process. Using all the 24 threads, each process can use up to 8 GB of memory.
SOM 1 also has a TESLA M2090 GPU card with 512 CUDA cores and a peak performance of 1331 Gigaflops.
SOM2 and SOM3 are connected via the InfiniBand network, so it is possible to increase computing capacity with a total of 88 threads.
In the support tab you have a guide about how to submit parallel jobs from som1.
SOM4 provides a high performance I/O data throught a PCIe SSD disk with an estimated bandwidth of 3GB/s in sequential read operations.
|Model||Supermicro SYS-6016GT-TF-TM2||Supermicro SYS-8017R-TF+||Supermicro SYS-8027-TRF+||Bull R423-E4i|
|Operating System||Scientific Linux release 6.1 (Carbon)||Scientific Linux release 6.2 (Carbon)||Scientific Linux release 6.5 (Carbon)|
Scientific Linux release 7.0 (Nitrogen)
2x Intel(R) Xeon(R) E5645@2.40GHz (6 cores)
|4x Intel(R) Xeon(R) E5email@example.comGHz (8 cores)||4x Intel(R) Xeon(R) E5firstname.lastname@example.orgGHz (8 cores)|
2 x Intel(R) Xeon(R) E5email@example.comGHz(16 cores)
|Memory Size:||192 GB RAM DDR3 1066 ECC Registered||512 GB RAM DDR3 1333 ECC Registered||512 GB RAM DDR3 1333 ECC Registered||512 GB RAM DDR3 2300 ECC Registered|
2 x HD SATA III 3TB = 6 TB
1 x HD SATA III 1TB = 1 TB
|3 x HD SATA III 3TB = 9 TB|
6 x HD SATA III 4TB (RAID 5)= 24 TB
1 x HD SSD 480GB
2 x HD SATA III 6TB (RAID 1)= 6TB
|Disk High Performance||None||None||None|
1 x PCIe SSD HD 1.4TB: Micron Technology Inc RealSSD P420m Sequential read speed: 3.3GB/s Sequential write speed: 630 MB/s
|RAID HW controller||None||None||Adaptec 6805 SAS/SATA||MegaRaid SAS 2208|
|Network||Infiniband Mellanox 4xQDR MHQH19B-XTR||Infiniband Mellanox 4xQDR MHQH19B-XTR||Infiniband Mellanox 4xQDR MHQH19B-XTR||None|
|GPU||1 x GPU NVIDIA TESLA M2090 (512 cores)||None||None||None|
|Intel Linpack Bench||112.0384 GFLOPS||184.9807 GFLOPS||277.4470 GFLOPS||836.1690 GFLOPS|
You can see the list of installed software opening a console session and going to the /software path.
You need to load each software module before you can use. You can obtain more help from the support tab but these are the basic commands.
- module avail: to show the list of all the packages that the system has available to use.
- module list: to show the loaded modules into your session.
- module load/unload: to load/unload any package or library before use it.
Documentation and Support
Change your password.
You should change the password of user the first time you login to the system. For successive times you should use:
[user @ som1 ~] $ yppasswd
Shared account system.
SOM1 and SOM2 have a shared account system so you can connect both using the same password. Should only change your password from som1.
Your storage space and directories.
When you access to som1 server have 100 GB storage for primary data at /home.
If you require more storage capacity you should contact with som1 administrator:
To know that space you have available at all times:
[user @ som1 ~] $ quota -s Disk quotas for user user (uid 503): Filesystem blocks quota limit grace files quota limit grace volgroup-lv_home 5120M 20480M 25600M 11 0 0 / dev/sdb1 250G 300G 2G 1 0 0 * User spent 5120MB (5GB) of 20GB of counts / home / user * User spent 300GB 2GB of space on data1. If you exceed the set reference available, the system will warn you reduce your storage allowing you to store more data for 7 days mores. For older disk requirements please contact the administrator.
Where is it?.
The /software directory contains all the software that is available for all user accounts.
The system automates the management of software installed to avoid having to configure every library you need to use.
How you use it ?.
Before you can use the available software, you have to enable each module in your session. To do it use the next commands:
- To list all the software loaded and available to use in your current session:
[user@som1 ~]$ module list Currently Loaded Modulefiles: 1) intel/12.1/icc_ifort_mkl 3) torque/3.0.3/torque 2) openmpi/1.4.4/intel
- To show all the software that you could use loading the module.
[user@som1 ~]$ module avail ------------------------ /software/Modules/3.2.9/modulefiles ----------------------- CCfits-2.4 fftw/3.3/intel intel/12.1/icc_ifort_mkl openmpi/1.6.2/intel CLHEP-188.8.131.52 galprop/54.1.984 mathematica/8.04/mathematica powerspectrum amiga gsl/1.15/gnu matlab/R2011b/matlab root/5.32.00/root cfitsio/3.290/gnu gsl/1.15/intel mpi4py/1.3/mpi4py scipy/0.10.0/scipy cmake-2.8.8 hdf5/1.8.8/gnu mpi4py/1.3/mpi4py_with_openmpi_1.6.2 torque/3.0.3/torque cuda/4.0/cuda hdf5/1.8.8/intel numpy/1.6.1/numpy vncserver fftw/2.1.5/gnu healpix_2.20a openmpi/1.4.3/gnu fftw/2.1.5/intel idl/8.1/idl openmpi/1.4.4/gnu fftw/3.3/gnu idl/8.2/idl openmpi/1.4.4/intel
- To load a required software into your session:
[user@som1 ~]$ module load cuda/4.0/cuda (you can press the tab key autocomplete )
- To unload a package of software:
[user@som1 ~]$ module unload cuda/4.0/cuda
If you want your software to be available every time you come into your session, then add the load module lines to your .bashrc file.
If you require any additional software you can fill this request suppor form.
Compile and Submit Jobs
How to compile your program.
You can compile your source code with these two compilers:
- gcc, gfortran 4.4.5 20110214 (x86_64)
- icc, ifort 12.1.2 20111128 (x86_64)
- If you have a sequential source:
[user@som1 ]$ icc -o hello hello.c or [user@som1 pruebas]$ gcc -o hello hello.c [user@som1 ]$ ifort -o hello hello.f or [user@som1 pruebas]$ gfortran -o hello hello.f
- If you have a MPI parallel source:
[user@som1 ]$ mpicc -o mpi_pong mpi_pong.c [user@som1 ]$ mpif90 -o mpi_pong mpi_pong.f
- If you have a OpenMP parallel source (Recommended for the features of this host):
[user1@som1 ]$ icc -openmp -o omp_hello omp_hello.c or [user@som1 ]$ gcc -fopenmp -o omp_hello omp_hello.c
How to submit your job to som1/2/3.
SOM1/2/3 are joined by a dedicated infiniband network as a two node cluster. In order not to disturb with the processes between users, has installed a queuing system that allows launching and monitoring your jobs. Any job that you would submit must be launched from SOM1.
Below are some steps that you should follow to submit jobs in SOM1/2/3.
- Compile your source and check that you have not any compilation error. If you have a binary file then check that your executable file is compatible with the platform SOM1.
- Make a very short execution to test that your program runs well. You must stop the execution if it takes more than a minute (ctrl + C). Some examples for submit your job without the queue system.
Sequential jobs:[user@som1 test]$ ./my_sequential_code param1 param2
MPI jobs: [user@som1 test]$ mpirun -np 2 ./my_mpi_code
OpenMP jobs: [user@som1 test]$ export OMP_NUM_THREADS=2; ./my_openmp_code param1 param2
- Submit your job to the system queue. You can see some examples below but it's recommended that you review the "System Queue" section to learn more about it.
- Sequential jobs:
[user@som1 test]$ clusterlauncher -n 1 ./my_sequential_code param1 param2
- MPI jobs:
[user@som1 test]$ clusterlauncher -n 80 ./my_mpi_code param1 param2
- OpenMP jobs:
[user@som1 test]$ clusterlauncher -n 20 ./my_openmp_code param1 param2
Submit to the system queue.
SOM1 and SOM2 are connected through an infiniband network so you can obtain a scalability uo to 84 processors launching to both servers. SOM1 and SOM2 have a shared memory architecture so if you only think use one server is recommended that your codes are programmed in openMP. However you can run sequential, MPI or openMP codes within a single machine.
clusterlauncher is the script that allows you submit a job. You only need specify how many processors you require and the estimated time of execution that your job will take. To obtain more help about clusterlaunch you can type:
clusterlauncher [-v] [-N <JOBNAME>] [-s som1|som2|som3|mix] [-q short|medium|long|gpu] [-m <MEMSIZE>] [-w <WALLTIME>] [--scratch] [--intranode] [--seq] [--openmp] -n number_processes EXECUTABLE params_executable
The queues available in the system are:
gpu -- -- 01:00:00 (1 hour)
short -- -- 24:00:00 (24 hours and limit of 72 procs by user)
medium -- -- 168:00:00(7 days and limit of 64 procs by user)
long -- -- 360:00:0 (15 days and limit of 32 procs by user)
eternity -- -- -- (require permission)
Below is showed some examples of how submit different types of jobs.
I have a sequential job that require an execution time of 40 minutes :
[user@som2 ~]# clusterlauncher -n 1 -w 00:40:00 ./my_sequential_job p1 p2
I have a sequential job that takes 5 hours and 1 GB of memory :
[user@som2 ~]# clusterlauncher -n 1 -w 05:00:00 -m 1g ./my_sequential_job p1 p2
I have a sequential job and I don't know how long it will take :
[user@som2 ~]# clusterlauncher -n 1 -q long ./my_sequential_job p1 p2
I have a MPI parallel jobthat requires 32 processorand you don't know how long it will take :
[user@som2 ~]# clusterlauncher -n 32 -q long ./my_MPI_job p1 p2
I have a MPI parallel job that requires 32 processorand you want launch it in som2 :
[user@som2 ~]# clusterlauncher -n 32 -s som2 -q long ./my_MPI_job p1 p2
I have an openMP job with a maximum of 32 threads and 2 hours of execution time :
[user@som2 ~]# clusterlauncher -n 32 --openmp -s som2 -w 02:00:00 ./my_openMP_job p1 p2
Monitoring your jobs.
You can get the status of your job typing:
[user@som1 test]$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 4290.som1 STDIN user 00:00:26 C short or [user@som1 test]$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 4292 user Running 20 1:00:00 Mon Nov 5 12:47:50 1 Active Job 20 of 84 Processors Active (26.44%) 1 of 2 Nodes Active (50.00%)
If your job is queued and you need more information about when your job will start then type:
[user@som1 test]$ showstart 4294 job 4294 requires 20 procs for 1:00:00 Earliest start in 00:00:00 on Mon Nov 5 12:49:41 Earliest completion in 1:00:00 on Mon Nov 5 13:49:41 Best Partition: DEFAULT
If you job is running and you need show the partial standard output you can type
[user@som1 test]$ qtail 4294 or [user@som1 test]$ qcat 4294
Deleting a job from the queue.
If you need to delete a job the you can use:
[user@som1 test]$ qdel 4294