Infrastructure

We employ both internal and external resources to perform scientific calculations.

Our own computing infrastructure consist of:

Theory Cluster : A heterogeneus cluster with 48 nodes, 116 cores, 232 threads and Infiniband network.
SOM1 Server : A server with 2 processors, 12 cores, 192 GB RAM, 1 NVIDIA GPU M2090 TESLA and Infiniband card.
SOM2 Server : A server with 4 processors, 32 cores, 512 GB RAM and Infiniband card.
SOM3 Server : A server with 4 processors, 32 cores, 512 GB RAM, 24TB DISK and Infiniband card.
SOM4 Server : A server with 2 processors, 32 cores, 512 GB RAM and 1 high performance I/O disk: 1.4 TB PCIe SSD Micron P420m (Seq. read: 3GB/s).

Theory

Hardware Description

Theory is a heterogeneous cluster composed of 48 heterogeneus nodes and interconnected throught a dedicated infiniband network.

	Front end (Theory)	Nodes01-30	Nodes31-36	Nodes37-39	Nodes40-48
Hostname	theory.ific.uv.es	node01..node30	node31..node36	node37..node39	node40..node48
Operating System	Red Hat Enterprise Linux Server release 5.1 (Tikanga)	Red Hat Enterprise Linux Server release 5.1 (Tikanga)	Red Hat Enterprise Linux Server release 5.1 (Tikanga)	Red Hat Enterprise Linux Server release 5.1 (Tikanga)	Red Hat Enterprise Linux Server release 5.1 (Tikanga)
Version Kernel	2.6.18-53.el5 x86_64	2.6.18-53.el5 x86_64	2.6.18-53.el5 x86_64	2.6.18-53.el5 x86_64	2.6.18-53.el5 x86_64
CPU	2 x Intel(R) Xeon(R) CPU 5150@ 2.66GHz (2 Cores)	2 x Intel(R) Xeon(R) CPU 5150@ 2.66GHz (2 Cores)	2 x Intel(R) Xeon(R) CPU 5150@ 2.66GHz (2 Cores)	2 x Intel(R) Xeon(R) CPU E5440 @ 2.83GHz (4 Cores)	2 x Intel(R) Xeon(R) CPU E5440 @ 2.83GHz (4 Cores)
Number of Cores:	4	4	4	8	8
Memory Size:	4 GB	4 GB	8 GB	8 GB	16 GB
Disk Size	2.5 TB	2.5 TB (NFS)	2.5 TB (NFS)	2.5 TB (NFS)	2.5 TB (NFS)
Network	Infiniband 20 Gbps	Infiniband 20 Gbps	Infiniband 20 Gbps	Infiniband 20 Gbps	Infiniband 20 Gbps
Linpack Bench

Installed Software

Documentation

Theory is a cluster composed by a frontend node and 48 compute nodes all connected by a infiniband network. The cluster has 240 available cores to launch parallel and sequential jobs. Theory has a queue system that imposes limits to the maximum execution time for a job and the number of cores used.

Below is a summary of the constraints that you should consider before submit a job to the queue system.

Sequential jobs:

Execution time limit	1 hour	1 week
Available cores	144	64
Max queued jobs per user	1000	1000

Parallel jobs:

Execution time limit	24 hours	1 week	1 month
Available cores	240	100	64
Max queued jobs per user	1000	1000	1000

It is very important that you specify the number of cores an the estimated execution time for your job. If you forget to put this then the system uses the minimum resources to run your job (1 hour and 1 core) and the system will kill the job after that time.

Quick Start Instructions

Submit a Job

Whenever you want to launch a serial or parallel job to the queuing system you must use the command: clusterlauncher

It is a simple command that allow you to specify some features about your job. In the next sections explains in detail the full set of options.

Using clusterlauncher command you only need to specify some basic arguments as the number of processes , the estimated time for you execution and the executable name and its parameters.

If you forget to specify the execution estimated time then your job will be associated as a short sequential so you should read the next lines after to submit any job.

[user@frontend]$ clusterlauncher -h

clusterlauncher [-N <JOBNAME>] [-m <MEMSIZE>] [-w <WALLTIME>] [--openmp] [--mpi_mp] [--fast] [--exclusive] -n number of processes [-p processes per node] EXECUTABLE params_executable"
-n : number of required processes. Default is 1."
-p : number of required processes per node."
-m: Estimated memory size that the job needs."
-w: Estimated execution time that your job needs to finish. Example: 01:00:00 (1 hour) or 00:30:00 (30 minutes)."
--openmp: If you want submit only an openmp job. "
--mpi_mp: If you want submit a MPI + openmp hybrid job. You must write the number of MPI tasks in the n option."
--fast: Only launch to the 8 core nodes [node37 to node48])."
--exclusive: Any other job will run in the same node."

Some examples:

I have a sequential job with a 1 hour of estimated execution time:

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 01:00:00 -n 1 ./seqjob 10000 0

If you do not know how much time your job will be running then you can specify a long time (for instance: 48 hours):

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 48:00:00 -n 1 ./seqjob 10000 0

I have a MPI job of 16 processes and an estimated time of 4 days (96 hours)

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 -n 16 ./parjob 10000 0

Theory has 12 nodes with 8 cores by node and 1GB by core. If you want that the system only uses this type of node then you should put the --fast option. Without the --fast option the system submits jobs to any type of node (4 and 8 core node) depending of the load of system.

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 --fast -n 16 ./parjob 10000 0

Also, if you want to any other job run in the same assigned node for your job then you have to specify the --exclusive option:

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 --fast --exclusive -n 16 ./parjob 10000 0

The same for using any type of node:

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 --exclusive -n 16 ./parjob 10000 0

I have a MPI job of 16 processes and each process requires 500 MB of memory.

The -m option permits you to specify how much memory your job requires. You have to specify the size of memory by process and not complete job.

For example: a MPI job with 16 MPI processes and each thread requires 1 GB -> 1*16 = 16 GB job complete.

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 -m 1g -n 16 ./parjob 10000 0

or the same in megas

[user@frontend]$ clusterlauncher -N MYJOBNAME -w 96:00:00 -m 1000m -n 16 ./parjob 10000 0

The -m option is optional but putting it ensures you that your job will run with enought resources.

When a job has finished the system writes a file with the error and standard output. You can find this file named as *.e<jobid> and *.o<jobid> in the same path where you submit your job.

Monitoring Your Jobs

Show Status, Check and Delete a Job

Use these commands to monitoring and delete your jobs in the queue system:

qstat	Shows the status of your jobs: R = running, Q= queued
showq	Shows more detailed information about the status of your job (start time, remaining time, etc.): [user@frontend colas]$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 5156 user Running 50 1:06:00:00 Thu Feb 21 09:59:47 1 Active Job 50 of 240 Processors Active (20.83%) 8 of 48 Nodes Active (16.67%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 5145 user Idle 108 1:00:00 Thu Feb 21 09:55:53 5148 user Idle 108 1:00:00 Thu Feb 21 09:56:16 5150 user Idle 108 1:00:00 Thu Feb 21 09:56:30
showbf	Shows the total available resources to use in this moment.
showstart <jobid>	It tells you how much time remains for the beginning of your job: [user@frontend colas]$ showstart 5142 job 5142 requires 108 procs for 10:00:00 Earliest start in 10:00:00 on Thu Feb 21 19:51:30 Earliest completion in 20:00:00 on Fri Feb 22 05:51:30
checkjob <jobid>	Gives the reasons for why your job is not running: [user@frontend colas]$ checkjob 5145 checking job 5145 State: Idle Creds: user:usergroup:teo class:short qos:short WallTime: 00:00:00 of 1:00:00 SubmitTime: Thu Feb 21 09:55:53 (Time Queued Total: 00:08:32 Eligible: 00:08:32) Total Tasks: 108 Req[0] TaskCount: 108 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] NodeSet=ONEOF:FEATURE:parallel IWD: [NONE] Executable: [NONE] Bypass: 7 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLE Reservation '5145' (00:00:00 -> 1:00:00 Duration: 1:00:00) PE: 108.00 StartPriority: 1009 job cannot run in partition DEFAULT (idle procs do not meet requirements : 104 of 108 procs found) idle procs: 240 feasible procs: 104 Rejection Reasons: [State : 30]
qdel <jobid>	Deletes a job running or enqueued in the system: [user@frontend colas]$ qdel 5145

Request an Account/Support

Click here to request an account.

Click here to request support

SOM1/SOM2/SOM3/SOM4

Hardware Description

SOM1 is a high-performance computation server of shared memory that has a total of 24 threads and 192 GB of memory.

It is intended to be used in scientific calculations with large memory requirement and computation per process. Using all the 24 threads, each process can use up to 8 GB of memory.

SOM 1 also has a TESLA M2090 GPU card with 512 CUDA cores and a peak performance of 1331 Gigaflops.

SOM2 and SOM3 are connected via the InfiniBand network, so it is possible to increase computing capacity with a total of 88 threads.
In the support tab you have a guide about how to submit parallel jobs from som1.

SOM4 provides a high performance I/O data throught a PCIe SSD disk with an estimated bandwidth of 3GB/s in sequential read operations.

Hostname	som1.ific.uv.es	som2.ific.uv.es	som3.ific.uv.es	som4.ific.uv.es
Model	Supermicro SYS-6016GT-TF-TM2	Supermicro SYS-8017R-TF+	Supermicro SYS-8027-TRF+	Bull R423-E4i
Operating System	Scientific Linux release 6.1 (Carbon)	Scientific Linux release 6.2 (Carbon)	Scientific Linux release 6.5 (Carbon)	Scientific Linux release 7.0 (Nitrogen)
Version Kernel	2.6.32-131.0.15.el6.x86_64	2.6.32-220.4.1.el6.x86_64	2.6.32-358.2.1.el6.x86_64	3.10.0-123.20.1.el7.x86_64
CPU	2x Intel(R) Xeon(R) E5645@2.40GHz (6 cores)	4x Intel(R) Xeon(R) E5-4620@2.20GHz (8 cores)	4x Intel(R) Xeon(R) E5-4620@2.20GHz (8 cores)	2 x Intel(R) Xeon(R) E5-2698v3@2.30GHz(16 cores)
N. Cores/Threads:	12/24	32/64	32/64	32/64
Memory Size:	192 GB RAM DDR3 1066 ECC Registered	512 GB RAM DDR3 1333 ECC Registered	512 GB RAM DDR3 1333 ECC Registered	512 GB RAM DDR3 2300 ECC Registered
Disk Size	2 x HD SATA III 3TB = 6 TB 1 x HD SATA III 1TB = 1 TB	3 x HD SATA III 3TB = 9 TB	6 x HD SATA III 4TB (RAID 5)= 24 TB	1 x HD SSD 480GB 2 x HD SATA III 6TB (RAID 1)= 6TB
Disk High Performance	None	None	None	1 x PCIe SSD HD 1.4TB: Micron Technology Inc RealSSD P420m Sequential read speed: 3.3GB/s Sequential write speed: 630 MB/s
RAID HW controller	None	None	Adaptec 6805 SAS/SATA	MegaRaid SAS 2208
Network	Infiniband Mellanox 4xQDR MHQH19B-XTR	Infiniband Mellanox 4xQDR MHQH19B-XTR	Infiniband Mellanox 4xQDR MHQH19B-XTR	None
GPU	1 x GPU NVIDIA TESLA M2090 (512 cores)	None	None	None
Intel Linpack Bench	112.0384 GFLOPS	184.9807 GFLOPS	277.4470 GFLOPS	836.1690 GFLOPS

Installed Software

You can see the list of installed software opening a console session and going to the /software path.

You need to load each software module before you can use. You can obtain more help from the support tab but these are the basic commands.

module avail: to show the list of all the packages that the system has available to use.
module list: to show the loaded modules into your session.
module load/unload: to load/unload any package or library before use it.

Documentation and Support

User Accounts

Change your password.

You should change the password of user the first time you login to the system. For successive times you should use:

[user @ som1 ~] $ yppasswd

Shared account system.

SOM1 and SOM2 have a shared account system so you can connect both using the same password. Should only change your password from som1.

Storage Space

Your storage space and directories.

When you access to som1 server have 100 GB storage for primary data at /home.

If you require more storage capacity you should contact with som1 administrator:

Disk quota:

To know that space you have available at all times:

[user @ som1 ~] $ quota -s

 Disk quotas for user user (uid 503):
 Filesystem blocks quota limit grace files quota limit grace
 volgroup-lv_home 5120M 20480M 25600M 11 0 0
 / dev/sdb1        250G   300G     2G  1 0 0
 * User spent 5120MB (5GB) of 20GB of counts / home / user
 * User spent 300GB 2GB of space on data1.
 If you exceed the set reference available, the system will warn you reduce your storage allowing you to store more data
 for 7 days mores.

 For older disk requirements please contact the administrator.

Backup System

Which files are saved.

Every night the system performs a backup of the files stored in your account.

It is important to know that the only files that are saved are located under your primary directory /home.

Installed Software

Where is it?.

The /software directory contains all the software that is available for all user accounts.

The system automates the management of software installed to avoid having to configure every library you need to use.

How you use it ?.

Before you can use the available software, you have to enable each module in your session. To do it use the next commands:

To list all the software loaded and available to use in your current session:

[user@som1 ~]$ module list
Currently Loaded Modulefiles:
       1) intel/12.1/icc_ifort_mkl   3) torque/3.0.3/torque
       2) openmpi/1.4.4/intel

To show all the software that you could use loading the module.

[user@som1 ~]$ module avail

------------------------  /software/Modules/3.2.9/modulefiles  -----------------------
CCfits-2.4                           fftw/3.3/intel                      intel/12.1/icc_ifort_mkl                                   openmpi/1.6.2/intel
CLHEP-2.1.2.2                  galprop/54.1.984            mathematica/8.04/mathematica                   powerspectrum
amiga                                 gsl/1.15/gnu                     matlab/R2011b/matlab                                   root/5.32.00/root
cfitsio/3.290/gnu               gsl/1.15/intel                    mpi4py/1.3/mpi4py                                          scipy/0.10.0/scipy
cmake-2.8.8                       hdf5/1.8.8/gnu                 mpi4py/1.3/mpi4py_with_openmpi_1.6.2  torque/3.0.3/torque
cuda/4.0/cuda                     hdf5/1.8.8/intel            numpy/1.6.1/numpy                                        vncserver
fftw/2.1.5/gnu                       healpix_2.20a                openmpi/1.4.3/gnu
fftw/2.1.5/intel                     idl/8.1/idl                          openmpi/1.4.4/gnu
fftw/3.3/gnu                         idl/8.2/idl                          openmpi/1.4.4/intel

To load a required software into your session:

 [user@som1 ~]$ module load cuda/4.0/cuda (you can press the tab key  autocomplete )

To unload a package of software:

 [user@som1 ~]$ module unload cuda/4.0/cuda

If you want your software to be available every time you come into your session, then add the load module lines to your .bashrc file.

If you require any additional software you can fill this request suppor form.

Compile and Submit Jobs

How to compile your program.

You can compile your source code with these two compilers:

- gcc, gfortran 4.4.5 20110214 (x86_64)
- icc, ifort 12.1.2 20111128 (x86_64)

If you have a sequential source:

[user@som1 ]$ icc -o hello hello.c   or  [user@som1 pruebas]$ gcc -o hello hello.c

[user@som1 ]$ ifort -o hello hello.f   or  [user@som1 pruebas]$ gfortran -o hello hello.f

If you have a MPI parallel source:

[user@som1 ]$ mpicc -o mpi_pong mpi_pong.c

[user@som1 ]$ mpif90 -o mpi_pong mpi_pong.f

If you have a OpenMP parallel source (Recommended for the features of this host):

[user1@som1 ]$ icc -openmp -o omp_hello omp_hello.c or [user@som1 ]$ gcc -fopenmp -o omp_hello omp_hello.c

How to submit your job to som1/2/3.

SOM1/2/3 are joined by a dedicated infiniband network as a two node cluster. In order not to disturb with the processes between users, has installed a queuing system that allows launching and monitoring your jobs. Any job that you would submit must be launched from SOM1.

Below are some steps that you should follow to submit jobs in SOM1/2/3.

Compile your source and check that you have not any compilation error. If you have a binary file then check that your executable file is compatible with the platform SOM1.

Make a very short execution to test that your program runs well. You must stop the execution if it takes more than a minute (ctrl + C). Some examples for submit your job without the queue system.

Sequential jobs:[user@som1 test]$ ./my_sequential_code param1 param2

MPI jobs: [user@som1 test]$ mpirun -np 2 ./my_mpi_code

OpenMP jobs: [user@som1 test]$ export OMP_NUM_THREADS=2; ./my_openmp_code param1 param2

Submit your job to the system queue. You can see some examples below but it's recommended that you review the "System Queue" section to learn more about it.

Sequential jobs:

[user@som1 test]$ clusterlauncher  -n 1 ./my_sequential_code param1 param2

MPI jobs:

[user@som1 test]$ clusterlauncher -n 80 ./my_mpi_code param1 param2

OpenMP jobs:

[user@som1 test]$ clusterlauncher -n 20 ./my_openmp_code param1 param2

Submit to the system queue.

SOM1 and SOM2 are connected through an infiniband network so you can obtain a scalability uo to 84 processors launching to both servers. SOM1 and SOM2 have a shared memory architecture so if you only think use one server is recommended that your codes are programmed in openMP. However you can run sequential, MPI or openMP codes within a single machine.

clusterlauncher is the script that allows you submit a job. You only need specify how many processors you require and the estimated time of execution that your job will take. To obtain more help about clusterlaunch you can type:

clusterlauncher -h

clusterlauncher [-v] [-N <JOBNAME>] [-s som1|som2|som3|mix] [-q short|medium|long|gpu] [-m <MEMSIZE>] [-w <WALLTIME>] [--scratch] [--intranode] [--seq] [--openmp] -n number_processes EXECUTABLE params_executable

Options:

-v: Verbose mode prints the command launched to the queue system.

-n NPROC: number of parallel processes required by the task. Default is 1.

-s server_name: hostname of the server to launch the job (by default the jobs are launched first to som3, som2 and then to som1, depending of the -n requirement).

-m: Total estimated memory size that the job needs.

-q: Name of the queue sytem to launch the job.

-i: path of a input file or directory to copy into the scratch. Only use with --scratch option

-o: name of generated output file or directory that should be copied to the current path. Only use with --scratch option

-w: Estimated execution time that your job needs to finish. Format example: 01:00:00 (1 hour) or 00:60:00 (60 minutes).

--scratch: Use scratch mode. Input/output files will be copied from your current path to a local disk of each compute node (use with -i and -o options).

--intranode: Jobs are launched into a single node. This option avoids network comunications between nodes.

--openmp: If you want submit a openmp job. You should give the number of threads throught the -n option.

--seq: Launch 1 sequential job. The job is launched without MPI support.

Examples:

clusterlauncher -n 1 -w 00:00:20 /bin/sleep 240 (launch 1 sequential process to queue system that requires 20 execution minutes).

clusterlauncher -n 10 -q long ./my_parallel_exec param1 (launch 10 parallel processes to the long queue).

clusterlauncher -n 1 -m 1g ./my_sequential_exec param1 (launch 1 sequential process that requires 1 g of memory).

clusterlauncher -n 10 -m 1500m ./my_parallel_exec param1 (launch 10 parallel process that require 1.5 GB of memory).

clusterlauncher -n 23 --openmp ./my_parallel_openmp_exec param1 (launch 1 openmp job that uses 23 threads).

clusterlauncher -n 1 --scratch -i ./myfile_input1 -i ./mydir_input1/ -o generated_dir_results/ ./my_sequential_scratch_exec param1 (launch 1 job using the scratch mode).

The queues available in the system are:

gpu -- -- 01:00:00 (1 hour)

short -- -- 24:00:00 (24 hours and limit of 72 procs by user)

medium -- -- 168:00:00(7 days and limit of 64 procs by user)

long -- -- 360:00:0 (15 days and limit of 32 procs by user)

eternity -- -- -- (require permission)

Below is showed some examples of how submit different types of jobs.

I have a sequential job that require an execution time of 40 minutes :

[user@som2 ~]# clusterlauncher  -n 1 -w 00:40:00 ./my_sequential_job p1 p2

I have a sequential job that takes 5 hours and 1 GB of memory :

[user@som2 ~]# clusterlauncher  -n 1 -w 05:00:00 -m 1g ./my_sequential_job p1 p2

I have a sequential job and I don't know how long it will take :

[user@som2 ~]# clusterlauncher  -n 1 -q long ./my_sequential_job p1 p2

I have a MPI parallel jobthat requires 32 processorand you don't know how long it will take :

[user@som2 ~]# clusterlauncher  -n 32 -q long ./my_MPI_job p1 p2

I have a MPI parallel job that requires 32 processorand you want launch it in som2 :

[user@som2 ~]# clusterlauncher -n 32 -s som2 -q long ./my_MPI_job p1 p2

I have an openMP job with a maximum of 32 threads and 2 hours of execution time :

[user@som2 ~]# clusterlauncher -n 32 --openmp -s som2 -w 02:00:00 ./my_openMP_job p1 p2

Monitoring your jobs.

You can get the status of your job typing:

[user@som1 test]$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
4290.som1                  STDIN            user           00:00:26 C short

or

[user@som1 test]$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

4292               user          Running    20     1:00:00  Mon Nov  5 12:47:50

     1 Active Job       20 of   84 Processors Active (26.44%)
                         1 of    2 Nodes Active      (50.00%)

If your job is queued and you need more information about when your job will start then type:

[user@som1 test]$ showstart 4294
job 4294 requires 20 procs for 1:00:00
Earliest start in         00:00:00 on Mon Nov  5 12:49:41
Earliest completion in     1:00:00 on Mon Nov  5 13:49:41
Best Partition: DEFAULT

If you job is running and you need show the partial standard output you can type

[user@som1 test]$ qtail 4294
or
[user@som1 test]$ qcat  4294

Deleting a job from the queue.

If you need to delete a job the you can use:

[user@som1 test]$ qdel 4294

Request an Account/Support

Click here to request an account.

Click here to request support

Request an Account/Support

Click here to request an account.

Click here to request support