Particle Physics and Particle Astrophysics: Internal

PPPA Computing


The 'HEP' Linux Cluster

NodeUser CPU Arch. HEP-SPEC06
hep0 Gateway 4 Opteron 2400 MHz x86_64
hephome Primary Server 8 Opteron 2400 MHz x86_64
hepdata1 Disk Server 4 Opteron 2400 MHz x86_64
hepatlas1 Disk Server 4 Opteron 2400 MHz x86_64
hepatlas2 Disk Server 4 Opteron 2400 MHz x86_64
hepatlas3 Disk Server 16 Opteron 2600 MHz x86_64
heprobot1 Tape Server 2 Athlon 2600 MHz x86_64
hepwww Web Server 2 Xeon 2800 MHz i386 9.54 (4.77/core)*
wn001-wn032 Worker Nodes 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
wn033-wn048 Worker Nodes 8 Phenom 3600 MHz x86_64 96 (12/core)*
wn049-wn064 Worker Nodes 6 Phenom 3200 MHz x86_64 72 (12/core)*
wn065-wn096 Worker Nodes 8 Phenom 3600 MHz x86_64 96 (12/core)*
gw001-gw095 Worker Nodes 2 Opteron 2400 MHz x86_64 13.4 (6.7/core)*
hep1 Jon Perkin 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep2 Kerry Parker 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep3 Callum Wilkinson 2 Athlon 64 3000 MHz x86_64
hep4 Matthew Lawe 4 Phenom 2400 MHz x86_64 33.55 (8.39/core)*
hep5 Ali Ahmed 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep6 Leonid Yuriev 2 Athlon 64 2600 MHz x86_64 15.72 (7.86/core)*
hep7 Tim Blackwell 2 Athlon 64 3000 MHz x86_64
hep8 Matt Robinson 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep9 Susan Cartwright 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep10 Dan Walker 2 Athlon 64 3000 MHz x86_64
hep11 Clive Tomlinson 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep12 Spare 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep13 Brais Lopez 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep14 Spare 2 Athlon 64 3000 MHz x86_64
hep15 Steve Sadler 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep16 Leon James Pickard 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep17 Ed Overton 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep18 Jonathan Shimwell 2 Athlon 64 3000 MHz x86_64
hep19 Sam Telfer 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep20 Josh McFayden 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep21 Paul Smith 4* Intel i7 3600 MHz x86_64
hep22 Paul Miyagawa 2 Athlon 64 3000 MHz x86_64
hep23 Dan Tovey 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep24 Stathes Paganis 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep25 Kerim Surliz 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep26 Chris Booth 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep27 Martin Richardson 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep28 Davide Costanzo 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep29 Ian Dawson 4 Phenom 2500 MHz x86_64 33.55 (8.39/core)*
hep30 Nicola Doyle 4 Phenom 2400 MHz x86_64 33.55 (8.39/core)*
hep31 Spare 2 Athlon 64 3000 MHz x86_64
hep32 Matthew Lawe 2 Athlon 64 3000 MHz x86_64
hep33 Paul Hodgson 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep34 Ed Daw 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hep35 Elena Korolkova 2 Athlon 2200 MHz x86_64
hep36 Matthew Taggart 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
hepdm1 Vitaly Kudryavtsev 2 Athlon 64 3200 MHz x86_64
shatlasa.cern.ch CERN 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
shatlasb.cern.ch CERN 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*
shatlasc.cern.ch CERN 4 Phenom 3200 MHz x86_64 47.82 (11.96/core)*

*Tested with Scientific Linux 4, but now running Scientific Linux 5

External Login

It is possible to use ssh log into hep0 from outside the university network, except from ip addresses which are blocked due to the risk of ssh brute force attacks.

It is also possible to use ssh to mount the hep0 filesystems in linux or macos using the fuse system. Most modern linux distributions supply this. Contact me for help setting it up. Mac OSX users should install MacFuse-Core(Tiger) (Leopard) (Snow Leopard) (Lion, project renamed OSXFUSE, be sure to install the macfuse compatibility layer) and then run MacFusion (Tiger) (Leopard) (Snow Leopard) (Lion, Matt Hack) which will set up fuse, and a graphical interface thereto.

Note that on Lion, it is necessary to set "Extra Options" to "-o allow_other" in order to make drag and drop file transfers work

Preliminary tests indicate the Lion instructions and applications also work on Mountain Lion.

Disk Space and Backups

Disk Space is mostly in the form of network shares. Your default home directory /home/your_user_name (use ~ as a short-cut) is shared from hep0 across the rest of the cluster. This is where you should keep your normal day-to-day files. This has a quota system. Type 'quota -s' so see how much of your quota you are currently using.

If you need more, there are a couple of options. /data1/your_user_name is a network file-system (visible from the whole cluster). This is much larger and does not have automatic quotas. In addition /scratch/your_user_name is the local disk space on each machine.

To see a summary of the availability and usage of disk space on your machine, type 'df -h'

The /home filesystem is fully backed up. As is e-mail delivered to hep0. That's it. All other filesystems should be considered volatile. Files can be backed up to tape on request.

Submitting Batch Jobs

Jobs may be submitted with 'qsub', cancelled with 'qdel' and queried with 'qstat'. All 3 commands have man pages e.g. 'man qsub'. A typical job submission consists 'qsub myjob.sh' where myjob.sh is some kind of script. Such scripts may be bash, tcsh, perl, etc. A simple script would be '#!/bin/sh' on the first line, followed by whatever you would type on the command line to run the job in question.

Memory Limits

By default all jobs are 'limited' to 1000 MB RAM. If your job exceeds this it will not be terminated, since there is a better than even chance that it will succeed despite itself.

If you know (or have a decent guess at) how much RAM your job will use, please indicate this when you submit. An overestimate does less harm than an underestimate.

To request a specific amount of RAM use 'qsub -lmem=XXXM' where XXX is the memory you require in MB.

Jobs requesting more than 1000 MB of RAM may be delayed because this request is harder for the cluster to fulfil. The probability and extent of this delay increases the more memory is requested. Jobs requesting more than 8000 MB of RAM will not run at all since no worker node can accommodate such a request.

Time Limits and Queues

In the event of high load, jobs submitted to a queue with a shorter time limit will receive higher priority.

The amount of cpu time that you have used in the last day or so is also considered in determining which jobs to run first.

The queues are as follows

infinite:
No time limit, lowest priority, no more than 40 cpus from the cluster may ever be used even if more are free
long:
4 day limit, all cpus available, low priority
medium:
1 day limit, all cpus available, higher priority
short:
8 hour limit, all cpus available, still higher priority
vshort:
2 hour limit, all cpus available, highest priority

Select your queue using the -q flag to qsub

e.g. qsub -q short myjob.sh

If this is not specified, you job will go into the medium queue which has a 1 day limit.

Any job exceeding its time limit for any reason will be unceremoniously killed

To see the priority (intended running order) of jobs queued on the system, run 'diagnose -p'. Running order is based on requested cpu time, fair share amongst users and how long the jobs have been waiting.

Interactive Batch Jobs

An interactive batch job may be started with 'qsub -I -X'. It is most useful if you do not have your own hep cluster machine and/or are logging in from outside the hep cluster. You may also choose to use it because the worker nodes are, on the whole, more powerful than the desktop machines.

A worker node is assigned just as with a normal batch cluster job, but rather than a pre-determined set of commands being executed, your terminal is attached directly to the worker node. You can then use this terminal as normal and take advantage of the worker node's cpu and memory.

Printing

Printing configuration throughout the Linux clusters is the same

Available Printers

To see the list of printers available through hep0, direct a web browser to http://hep0.shef.ac.uk:631/printers.

Printing from Linux

To see what printers are available on your hep cluster node, direct a web browser to localhost:631/printers

Printing from Mac

Mac users should find that they can print through hep0 using the normal print management tools. The hep0 printers are advertised through the network as 'NAME @ HEP'. It is best to select the generic postscript driver if you are offered a choice.

You can print to the new Phaser directly if you prefer. The drivers are included in Leopard but not Tiger. Download here. The printer has bonjour and all the usual advertising is turned on and appears in the list as 'Phaser 8560DN (00:00:aa:d4:ba:76)'. If you want/need to configure it manually, the ip address is 143.167.7.207 and the easiest protocol to use is JetDirect. Note that a manual, jetdirect configuration is advantagous for laptops since it functions through both wired and wireless ethernet without modification.

The same is true of the departmental printer. However without the post-processing performed by hep0, this older printer has difficulty with some print jobs, for example those sent by Adobe Reader 8. The ip address is 143.167.7.247, and once again, jetdirect will serve you well. The printer model is HP LaserJet 8150 PS.

There is also a combined printer/scanner/photocopier in e18a. The model is Samsung SCX-4825. Mac software is here. Note that the installation requires exit of all applications and may require a reboot. The ip address is 143.167.6.28 and again JetDirect will serve you well. The printer in D36 is similar (SCX-4828) and has ip address 143.167.4.137.

The printer in D38 is also network capable and can be printed to directly. It is a Brother HL-3040CN. A mac should automatically find and install the needed drivers. The ip address is 143.167.4.12 and JetDirect is, once again, recommended.

Printing from Windows

An easy way to print from windows is using the Mac method.

Install Bonjour for Windows. Download here. This will add an item 'Bonjour Printer Wizard' to the start menu. This wizard should find all the hep0 printers advertised as 'NAME @ HEP'. It should not be necessary to install drivers, as the conversion from Postscript to the proper language for the printer is done by the attached Linux machine.

The Phaser is a special case in that is has an independent network connection and you can print to it directly. First download and install this driver.

In the add printer wizard, you need to specify 'Local' printer (printer not attached to another windows machine) and define a TCP/IP port to 143.167.7.207. When the driver dialog comes up, hit 'Have Disk' and navigate to the directory in C:\WINDOWS\Xerox where the driver inf file was dropped. After installation, you can go into the advanced properties in the usual way and enable double-sided printing.

The same is true of the departmental printer. The ip address is 143.167.7.247 and the model is HP LaserJet 8150 PS. As mentioned above in the mac description, printing to the departmental printer directly has disadvantages.

There is also a combined printer/scanner/photocopier in e18a. The model is Samsung SCX-4825. Windows software is here. Note that the installation requires exit of all applications and may require a reboot. The ip address is 143.167.6.28 and again JetDirect will serve you well. The printer in D36 is similar (SCX-4828) and has ip address 143.167.4.137.

The printer in D38 is also network capable and can be printed to directly. It is a Brother HL-3040CN. The Windows 7 driver is here. The ip address is 143.167.4.12 and JetDirect is, once again, recommended.

A Beginner's Guide to Linux and the 'HEP' Cluster

The first thing to understand is that Linux is a command line based operating system. Whilst many common operations can be achieved by clicking icons on the desktop, this is not where the power resides.

Since the graphical desktop is rather straightforward to use, I will confine myself to important command line operations.

Getting Started

On the interactive 'HEP' cluster, you will find yourself staring at a graphical login prompt

Login Window

Enter your username and password. You may also like to take a look at the 'Session type' drop-down list.

There are 2 standard desktop managers to choose from, KDE and Gnome. They are rather similar in operation, but people tend to develop a preference. The default is Gnome. I suggest you give them both a try

Gnome desktop

Gnome Desktop

KDE desktop

KDE Desktop

Whilst the desktop manager will always be running in normal operation, the real power of Linux is the command line. The first thing to do it start a terminal. There will be a button in the menu (bottom left button) and various other shortcuts to launching a terminal, but the easiest way is to right-click on the desktop and find the item on the pop-up menu which launches the terminal.

In addition to the terminal button, you will find many other application buttons for applications, utilities, system administration (don't touch) and so on. It's best not to depend on these because they all have their limits. In particular, I don't want you to get too fond of any file system managers you might find, since file management is, like everything else, much better done from the command line.

KDE Terminal

Terminal

Basic Commands

Most of the programs under Linux are simple command line. To find out how to use a command, bring up the man page. e.g. man cp will tell you how to use the cp command to copy files within the filesystem

pwd: This command tells you where you are (what directory) in the filesystem. When you first log in, you'll find yourself in your home area. This is where you should keep day to day files.

cd: This is the command to change directory. To change into a subdirectory of your current location: cd mydir. To ascend one level in the directory structure: cd ... You may also specify a full path: cd /home/bob/tmp/test. You'll get used to it.

ls: List the contents of the current directory. Most useful. This command has several useful options. ls -l will show details of each item in the directory. ls -A shows files and directories starting with . which are normally hidden. ls -rt shows files is reverse order of modification time (newest at the bottom). This is one command it's worth playing with and getting to know.

cp: This command copies files. You can specify one or more files or directories to be copied, the final argument is the destination. If you specify a directory which already exists as the destination, all files will be copied into that directory, using the same names as the original files. Note that if you specify more than one file or directory to copy, the destination must be an existing directory. Note that . indicates the current directory, and .. indicates one directory up. Filesystem commands tend to respond to the -r (recursive) flag which causes the command to descend into directories. This is often appropriate.

mv: Functions in much the same way as cp except that the original files are not preserved. This is also the normal way to rename files. If the new location for the files is on the same filesystem, the command is very quick, because the files are simply renamed and not copied then deleted.

rm: Remove file. Very dangerous command as files cannot be restored once removed. With the -r flag, especially dangerous. For example, rm -r * can promptly remove every file you own.

mkdir: create a directory with the specified name

tar: Possibly the most useful program ever devised. Rather similar to (and predating) windows zip programs. It creates one large file containing several smaller files and/or directories. It is common for software to be distributed in the form of tar files. A tar may be compressed with the standard Linux compression tool gzip. Such compressed tars are often named file.tar.gz or file.tgz. To create a gzipped tar file use tar cvzf archive.tar.gz files. To extract, use tar xvzf archive.tar.gz. There are many options for tar which are listed in the man pages.

There are many other commands, hundreds in fact. If you come across a task and you don't know which is the proper program to use, please ask. There are programs for everything, including difficult jobs like drawing Feynmann diagrams.

Web and Mail

You're not going to get far without a web browser, so let's start there. Type firefox on your terminal. This will bring up the standard web-browser, Firefox. If you fancy something different, there is also Konqueror (which I'm not impressed with), Or more usefully Opera which is pretty nice and has e-mail and various other functionality thrown in.

There is no official build of Google Chrome for Scientific Linux 5 and no safe third party build I have been able to find.

The university will have created an e-mail account for you which is available via webmail and imap. The webmail (either via the Portal or Google) is convenient because it requires no set-up, but it is also slow and can be unreliable.

More popular long-term is thunderbird in imap mode. This allows you to access the mail directly on the server from an number of computers, just as with webmail, but the work is done by your computer. Instructions are available on the cics web-pages. For laptops or home machines, you'll want to follow the instructions for "Sending Mail Securely" and "Receiving Mail Securely" or it won't work.

Remote Login

Once you have an account on the system, you have access to the entire cluster. It is possible to log in from one machine to another by using ssh and specifying the hostname. e.g. ssh hep8 will (after prompting for a password) log your terminal into hep8. If you then run an application (you could try firefox again), it will run on hep8 but its window (if it has one) will be directed back to the machine at which you sit. This is called X-window forwarding and can be extremely useful

To log in to another machine or to log into hep0 from outside a similar command is used. In this case, you should specify the fully qualified domain name (fqdn) and, if it differs from their username where you are, your username. e.g. ssh bob@hep0.shef.ac.uk. Use this command to access your files when you're not at the university.

If you need to transfer files, use scp

Frequently Asked Questions

Firefox won't start / bookmarks gone

If firefox dies or a crash occurs while it is in use, it gets very upset. You may see this window when you try to start it.

If you have firefox running on another cluster machine, then you should quit it. Most likely, this is not the case and firefox has simply become confused.

In a terminal, cd to '~/.mozilla/firefox/*.*', this is your profile directory and contains all the firefox settings etc. Using rm, delete the files 'lock' and '.parentlock'. This is enough to get firefox running, but not enough to properly fix it. You might try starting firefox at this point, but most likely you will find that your bookmarks are gone, your 'Back' button doesn't work and the thing is generally crippled. Quit firefox and in the same directory as before remove the files 'places.sqlite' and 'places.sqlite-journal' (the latter is usually there, but not always). Firefox should then go back to working properly.

Evo won't run

Evo often refuses, or fails to run; 2 minutes before your conference is due to start. The error message it produces is either misleading or flat-out false.

Usually this is down to Evo fighting with Java. It can often be solved by clearing the java webstart cache, which forces the system to download Evo from scratch again: http://evo.caltech.edu/evoGate/Documentation/faq/uninstall/index.html

Sometimes it is also necessary to remove the directory ~/.Koala which holds your Evo settings. Note that this will cause it to forget your password.

Valid HTML 4.01 Strict