Presentation of the computing cluster at CERMICS



The computing cluster of CERMICS is currently composed of 31 machines. They are named 'clusternXX', where XX is 01 for the master node, and XX between 02 and 40 for the computing nodes (with some gaps related to history or machine failures).

To access the clustern, you need to be a registred user with a specific computer account different from you CERMICS account. To ask for an account, you should contact Laurent MONASSE (monassel.AT.cermics.enpc.fr). The creation of the account has to be validated by the head of the CERMICS (currently Jean-Francois DELMAS). The initial password to access the cluster is the same as your password on titan. You can change it by using the command "yppasswd".


Composition of the cluster and performances

Performances (reported computing times in minutes) have been tested by a simple sequential code available here.

clustern01: access to other machines (DO NOT LAUNCH COMPUTATIONS ON IT)
PROC: 4 x Intel Xeon,2.40GHz
RAM: 2 Go

ORDER OF THE PRESENTATION: FROM THE NEWEST TO THE OLDEST MACHINES

clustern14 to 25 (fall 2014):
PROC: 32 x Intel Xeon E5-2667, 3.30GHz (hyperthreading)
RAM: 192 Go
TIME: 2.7 minutes

clustern11, 12, 13 (september 2013):
PROC: 32 x Intel Xeon E5-2667, 3.30GHz (hyperthreading)
RAM: 192 Go
TIME: 2.6 minutes

clustern09, 10 (september 2013):
PROC: 32 x Intel Xeon E5-2690, 2.90GHz (hyperthreading)
RAM: 192 Go
TIME: 3.0 minutes

clustern08 (july 2013):
PROC: 16 x Intel Xeon E5-2643, 3.30GHz (hyperthreading)
RAM: 762 Go
TIME: 2.9 minutes

clustern06 and clustern07 (april 2012):
PROC: 24 x Intel Xeon X5690, 3.47GHz (hyperthreading)
RAM: 192 Go
TIME: 3.15 minutes

clustern04 and clustern05 (october 2011):
PROC: 24 x Intel Xeon X5690, 3.47GHz (hyperthreading)
RAM: 24 Go
TIME: 3.1 minutes

clustern02 (november 2010):
PROC: 32 x AMD Opteron 6134, 2.3 GHz
RAM: 64 Go
TIME: 4.3-4.6 minutes

clustern03 (november 2010):
PROC: 24 x Intel Xeon X5680, 3.33GHz (hyperthreading)
RAM: 64 Go
TIME: test not performed

clustern33, clustern39 and clustern40 (bought in 2009):
PROC: 16 x Intel Xeon X5550, 2.67GHz
RAM: 48 Go
TIME: 4.0-4.2 minutes

clustern31 and clustern32 (bought in 2008):
PROC: 8 x Intel Xeon E5420, 2.50GHz
RAM: 32 Go
TIME: 3.9 minutes

Network configuration:
Nodes are interconnected through a gigabit switch.

Access:
The primary machine is clustern.enpc.fr, which can be attained from titan by "ssh clustern" (using standard options of ssh). Computing machines are then attained by "ssh clusternXX" from the primary machine clustern01.
In order not to retype your password to switch in-between nodes, use keygen (type "ssh-keygen -t rsa", do not give a password, and copy the file ".ssh/id_rsa.pub" to ".ssh/authorized_keys" in your home directory on the cluster).

Running jobs:
There is no queue process, jobs are run interactively by logging on the desired machine. Beware that the machine that you are working on is now overloaded! If all the machines are overloaded and that you do not have space, you should send an email to monassel.AT.cermics.enpc.fr. In this case, Laurent Monasse will act as a referee and encourage some of the users to renice/kill some of their jobs in order to achieve some work balance among the users. DO NOT HESITATE TO ASK FOR SOME SPACE!
It is better to use commands such as "nohup [executable file] > [output file] &", or to resort to "screen", in order for the jobs not to be killed when you exit the computing node.
Plan also to incorporate some periodic backup in your computations and restarting mechanisms in case the node crashes.

Use of disk space:
The home directory (/home/login) is on the harddrive of clustern01. By NFS export, it is accessible on all other machines. There is a backup on this space with a period of 15 days, but no archiving.
Since the /home directory is limited, you are strongly encouraged to use the spaces /libre on each machine (local harddrive). You can then send your data from the local disk to your CERMICS account using "scp".

Monitoring tools:
multitop.py copy it from "/home/monassel/multitop.py" and execute it as "./multitop.py". The python script connects to all machines and lists all jobs and the name of their user.
CPU load and memory history see the page.

System information:
cat /proc/cpuinfo information on the processor
cat /proc/meminfo information on RAM and swap
hardinfo full information on system configuration
/usr/sbin/hwinfo | grep system.hardware.serial obtain serial number to find the age of the machine on the DELL website (link)
top real-time information on the CPU load, RAM, swap and many other properties!

Softwares and libraries:
Installation upon request. Address a ticket request at DSI assistance. Feel free to inform Laurent Monasse of any urgent action needed.

Contacts:
All the users of the clustern are in the email alias users-cluster.AT.cermics.enpc.fr. Feel free to communicate with the other users through this address.