Comp:clusters

From Theochem
Jump to navigationJump to search

List with clustermachines (Only works with vpn)

Runs on this machine should go via a queue system. First read the wiki-page of CNCZ about its usage.

Our partition is

 thchem.

How to send things to a cluster depends a bit on how data heavy your job is. For simulations with large trajectory files (lammps, vasp) you typically send them to a specific cluster node (with the flag -w). Here also the restart files/input data can be rather large. The simulations are then run from the /scratch or /scratch2 partition. These partition are local on the cluster and are only locally accessible from the specific node. This makes them faster in writing and less dependable on the network. Always make a directory with your login name from which you will work (with subdirectories). Look here for more information to use scratch. For other jobs you would like slurm to decide which clusternode will be best to use. In this case you do not use the -w flag. Here you need to make sure that all clusternodes have access to your files and you have to start your script from a subdirectory of your home directory.

Since we have many cluster with a smaller number of cores, it is sometimes hard for jobs that require many processors from the same node to start. This can be if many single-core jobs are running on many different nodes. For this reason it is possible to claim a clusternode for some time.

We have a new and automated system to reseverve cluster nodes

So follow this link: cluster node reservation system

The Slack/Yoink system is no longer in use.

The Slack yoink-app that we no longer use

Before you submit a job, check under #cluster-usage which nodes are claimed or not. You can only communicate with yoinkbot in the #cluster-usage channel and by direct message. To claim a resource:

 /yoink cn** [message] [lock duration]

To release a resource:

 /release cn**

With a direct message to yoinkbot "resources list" you get a list with all claimed resources or by typing

 "/dm @yoinkbot resources list"

in #cluster-usage

The first time a cluster node is claimed it will ask you whether you want to create this. Just confirm. For electrostructure calculations, the output is typically small (can be in home), but the program needs to store large intermediate files (put on scratch, make sure they are deleted at the end of simulation).

Ask for example slurm scripts to someone who run similar jobs.