Presentation of the computing cluster at CERMICS
The computing cluster of CERMICS is currently composed of 31 machines. They are named 'clusternXX', where
XX is 01 for the master node, and XX between 02 and 40 for the computing nodes (with some gaps related to history or machine failures).
To access the clustern, you need to be a registred user with a specific computer account different from you CERMICS account. To ask for an account, you should contact Laurent MONASSE (monassel.AT.cermics.enpc.fr). The creation of the account has to be validated by the head of the CERMICS (currently Jean-Francois DELMAS). The initial password to access the cluster is the same as your password on titan. You can change it by using the command "yppasswd".
Composition of the cluster and performances
Performances (reported computing times in minutes) have been tested by a simple sequential code available
here.
clustern01: access to other machines
(DO NOT LAUNCH COMPUTATIONS ON IT)
PROC: 4 x Intel Xeon,2.40GHz
RAM: 2 Go
ORDER OF THE PRESENTATION: FROM THE NEWEST TO THE OLDEST MACHINES
clustern14 to 25 (fall 2014):
PROC: 32 x Intel Xeon E5-2667, 3.30GHz (hyperthreading)
RAM: 192 Go
TIME: 2.7 minutes
clustern11, 12, 13 (september 2013):
PROC: 32 x Intel Xeon E5-2667, 3.30GHz (hyperthreading)
RAM: 192 Go
TIME: 2.6 minutes
clustern09, 10 (september 2013):
PROC: 32 x Intel Xeon E5-2690, 2.90GHz (hyperthreading)
RAM: 192 Go
TIME: 3.0 minutes
clustern08 (july 2013):
PROC: 16 x Intel Xeon E5-2643, 3.30GHz (hyperthreading)
RAM: 762 Go
TIME: 2.9 minutes
clustern06 and clustern07 (april 2012):
PROC: 24 x Intel Xeon X5690, 3.47GHz (hyperthreading)
RAM: 192 Go
TIME: 3.15 minutes
clustern04 and clustern05 (october 2011):
PROC: 24 x Intel Xeon X5690, 3.47GHz (hyperthreading)
RAM: 24 Go
TIME: 3.1 minutes
clustern02 (november 2010):
PROC: 32 x AMD Opteron 6134, 2.3 GHz
RAM: 64 Go
TIME: 4.3-4.6 minutes
clustern03 (november 2010):
PROC: 24 x Intel Xeon X5680, 3.33GHz (hyperthreading)
RAM: 64 Go
TIME: test not performed
clustern33, clustern39 and clustern40 (bought in 2009):
PROC: 16 x Intel Xeon X5550, 2.67GHz
RAM: 48 Go
TIME: 4.0-4.2 minutes
clustern31 and clustern32 (bought in 2008):
PROC: 8 x Intel Xeon E5420, 2.50GHz
RAM: 32 Go
TIME: 3.9 minutes
Network configuration:
Nodes are interconnected through a gigabit switch.
Access:
The primary machine is clustern.enpc.fr, which can be attained from titan by "ssh clustern" (using standard
options of ssh). Computing machines are then attained by "ssh clusternXX" from the primary machine clustern01.
In order not to retype your password to switch in-between nodes, use keygen
(type "ssh-keygen -t rsa", do not give a password, and copy the file ".ssh/id_rsa.pub" to ".ssh/authorized_keys" in your home directory on the cluster).
Running jobs:
There is no queue process, jobs are run interactively by logging on the desired machine.
Beware that the machine that you are working on is now overloaded! If all the machines are
overloaded and that you do not have space, you should send an email to monassel.AT.cermics.enpc.fr.
In this case, Laurent Monasse will act as a referee and encourage some of the users
to renice/kill some of their jobs in order to achieve some work balance among the users.
DO NOT HESITATE TO ASK FOR SOME SPACE!
It is better to use commands such as "nohup [executable file] > [output file] &",
or to resort to "screen", in order for the jobs not to be killed
when you exit the computing node.
Plan also to incorporate some periodic backup in your computations
and restarting mechanisms in case the node crashes.
Use of disk space:
The home directory (/home/login) is on the harddrive of clustern01. By NFS export, it is accessible
on all other machines. There is a backup on this space with a period of 15 days, but no archiving.
Since the /home directory is limited, you are strongly encouraged to use the spaces /libre
on each machine (local harddrive). You can then send your data from the local disk
to your CERMICS account using "scp".
Monitoring tools:
multitop.py
copy it from "/home/monassel/multitop.py" and execute it as "./multitop.py". The python script connects to all
machines and lists all jobs and the name of their user.
CPU load and memory history
see the page.
System information:
cat /proc/cpuinfo
information on the processor
cat /proc/meminfo
information on RAM and swap
hardinfo
full information on system configuration
/usr/sbin/hwinfo | grep system.hardware.serial
obtain serial number to find the age of the machine on the DELL website (link)
top
real-time information on the CPU load, RAM, swap and many other properties!
Softwares and libraries:
Installation upon request. Address a ticket request at DSI assistance. Feel free to inform Laurent Monasse of any urgent action needed.
Contacts:
All the users of the clustern are in the email alias
users-cluster.AT.cermics.enpc.fr.
Feel free to communicate with the other users through this address.