PPPA Computing
- External Login
- Submitting Batch Jobs
- Interactive Batch Jobs
- Disk Space and Backups
- Printing
- Beginner's Guide
- F.A.Q.
The 'HEP' Linux Cluster | ||||||
| Node | User | CPU | Arch. | HEP-SPEC06 | ||
|---|---|---|---|---|---|---|
| hep0 | Gateway | 4 | Opteron 2400 MHz | x86_64 | ||
| hephome | Primary Server | 8 | Opteron 2400 MHz | x86_64 | ||
| hepdata1 | Disk Server | 4 | Opteron 2400 MHz | x86_64 | ||
| hepatlas1 | Disk Server | 4 | Opteron 2400 MHz | x86_64 | ||
| hepatlas2 | Disk Server | 4 | Opteron 2400 MHz | x86_64 | ||
| hepatlas3 | Disk Server | 16 | Opteron 2600 MHz | x86_64 | ||
| heprobot1 | Tape Server | 2 | Athlon 2600 MHz | x86_64 | ||
| hepwww | Web Server | 2 | Xeon 2800 MHz | i386 | 9.54 (4.77/core)* | |
| wn001-wn032 | Worker Nodes | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| wn033-wn048 | Worker Nodes | 8 | Phenom 3600 MHz | x86_64 | 96 (12/core)* | |
| wn049-wn064 | Worker Nodes | 6 | Phenom 3200 MHz | x86_64 | 72 (12/core)* | |
| wn065-wn096 | Worker Nodes | 8 | Phenom 3600 MHz | x86_64 | 96 (12/core)* | |
| gw001-gw095 | Worker Nodes | 2 | Opteron 2400 MHz | x86_64 | 13.4 (6.7/core)* | |
| hep1 | Jon Perkin | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep2 | Kerry Parker | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep3 | Callum Wilkinson | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep4 | Matthew Lawe | 4 | Phenom 2400 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep5 | Ali Ahmed | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep6 | Leonid Yuriev | 2 | Athlon 64 2600 MHz | x86_64 | 15.72 (7.86/core)* | |
| hep7 | Tim Blackwell | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep8 | Matt Robinson | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep9 | Susan Cartwright | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep10 | Dan Walker | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep11 | Clive Tomlinson | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep12 | Spare | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep13 | Brais Lopez | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep14 | Spare | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep15 | Steve Sadler | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep16 | Leon James Pickard | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep17 | Ed Overton | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep18 | Jonathan Shimwell | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep19 | Sam Telfer | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep20 | Josh McFayden | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep21 | Paul Smith | 4* | Intel i7 3600 MHz | x86_64 | ||
| hep22 | Paul Miyagawa | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep23 | Dan Tovey | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep24 | Stathes Paganis | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep25 | Kerim Surliz | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep26 | Chris Booth | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep27 | Martin Richardson | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep28 | Davide Costanzo | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep29 | Ian Dawson | 4 | Phenom 2500 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep30 | Nicola Doyle | 4 | Phenom 2400 MHz | x86_64 | 33.55 (8.39/core)* | |
| hep31 | Spare | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep32 | Matthew Lawe | 2 | Athlon 64 3000 MHz | x86_64 | ||
| hep33 | Paul Hodgson | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep34 | Ed Daw | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hep35 | Elena Korolkova | 2 | Athlon 2200 MHz | x86_64 | ||
| hep36 | Matthew Taggart | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| hepdm1 | Vitaly Kudryavtsev | 2 | Athlon 64 3200 MHz | x86_64 | ||
| shatlasa.cern.ch | CERN | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| shatlasb.cern.ch | CERN | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
| shatlasc.cern.ch | CERN | 4 | Phenom 3200 MHz | x86_64 | 47.82 (11.96/core)* | |
*Tested with Scientific Linux 4, but now running Scientific Linux 5
External Login
It is possible to use ssh log into hep0 from outside the university network, except from ip addresses which are blocked due to the risk of ssh brute force attacks.
It is also possible to use ssh to mount the hep0 filesystems in linux or macos using the fuse system. Most modern linux distributions supply this. Contact me for help setting it up. Mac OSX users should install MacFuse-Core(Tiger) (Leopard) (Snow Leopard) (Lion, project renamed OSXFUSE, be sure to install the macfuse compatibility layer) and then run MacFusion (Tiger) (Leopard) (Snow Leopard) (Lion, Matt Hack) which will set up fuse, and a graphical interface thereto.
Note that on Lion, it is necessary to set "Extra Options" to "-o allow_other" in order to make drag and drop file transfers work
Preliminary tests indicate the Lion instructions and applications also work on Mountain Lion.
Disk Space and Backups
Disk Space is mostly in the form of network shares. Your default home directory /home/your_user_name (use ~ as a short-cut) is shared from hep0 across the rest of the cluster. This is where you should keep your normal day-to-day files. This has a quota system. Type 'quota -s' so see how much of your quota you are currently using.
If you need more, there are a couple of options. /data1/your_user_name is a network file-system (visible from the whole cluster). This is much larger and does not have automatic quotas. In addition /scratch/your_user_name is the local disk space on each machine.
To see a summary of the availability and usage of disk space on your machine, type 'df -h'
The /home filesystem is fully backed up. As is e-mail delivered to hep0. That's it. All other filesystems should be considered volatile. Files can be backed up to tape on request.
Submitting Batch Jobs
Jobs may be submitted with 'qsub', cancelled with 'qdel' and queried with 'qstat'. All 3 commands have man pages e.g. 'man qsub'. A typical job submission consists 'qsub myjob.sh' where myjob.sh is some kind of script. Such scripts may be bash, tcsh, perl, etc. A simple script would be '#!/bin/sh' on the first line, followed by whatever you would type on the command line to run the job in question.
Memory Limits
By default all jobs are 'limited' to 1000 MB RAM. If your job exceeds this it will not be terminated, since there is a better than even chance that it will succeed despite itself.
If you know (or have a decent guess at) how much RAM your job will use, please indicate this when you submit. An overestimate does less harm than an underestimate.
To request a specific amount of RAM use 'qsub -lmem=XXXM' where XXX is the memory you require in MB.
Jobs requesting more than 1000 MB of RAM may be delayed because this request is harder for the cluster to fulfil. The probability and extent of this delay increases the more memory is requested. Jobs requesting more than 8000 MB of RAM will not run at all since no worker node can accommodate such a request.
Time Limits and Queues
In the event of high load, jobs submitted to a queue with a shorter time limit will receive higher priority.
The amount of cpu time that you have used in the last day or so is also considered in determining which jobs to run first.
The queues are as follows
- infinite:
- No time limit, lowest priority, no more than 40 cpus from the cluster may ever be used even if more are free
- long:
- 4 day limit, all cpus available, low priority
- medium:
- 1 day limit, all cpus available, higher priority
- short:
- 8 hour limit, all cpus available, still higher priority
- vshort:
- 2 hour limit, all cpus available, highest priority
Select your queue using the -q flag to qsub
e.g. qsub -q short myjob.sh
If this is not specified, you job will go into the medium queue which has a 1 day limit.
Any job exceeding its time limit for any reason will be unceremoniously killed
To see the priority (intended running order) of jobs queued on the system, run 'diagnose -p'. Running order is based on requested cpu time, fair share amongst users and how long the jobs have been waiting.
Interactive Batch Jobs
An interactive batch job may be started with 'qsub -I -X'. It is most useful if you do not have your own hep cluster machine and/or are logging in from outside the hep cluster. You may also choose to use it because the worker nodes are, on the whole, more powerful than the desktop machines.
A worker node is assigned just as with a normal batch cluster job, but rather than a pre-determined set of commands being executed, your terminal is attached directly to the worker node. You can then use this terminal as normal and take advantage of the worker node's cpu and memory.
Printing
Printing configuration throughout the Linux clusters is the same
Available Printers
To see the list of printers available through hep0, direct a web browser to http://hep0.shef.ac.uk:631/printers.
Printing from Linux
To see what printers are available on your hep cluster node, direct a web browser to localhost:631/printers
Printing from Mac
Mac users should find that they can print through hep0 using the normal print management tools. The hep0 printers are advertised through the network as 'NAME @ HEP'. It is best to select the generic postscript driver if you are offered a choice.
You can print to the new Phaser directly if you prefer. The drivers are included in Leopard but not Tiger. Download here. The printer has bonjour and all the usual advertising is turned on and appears in the list as 'Phaser 8560DN (00:00:aa:d4:ba:76)'. If you want/need to configure it manually, the ip address is 143.167.7.207 and the easiest protocol to use is JetDirect. Note that a manual, jetdirect configuration is advantagous for laptops since it functions through both wired and wireless ethernet without modification.
The same is true of the departmental printer. However without the post-processing performed by hep0, this older printer has difficulty with some print jobs, for example those sent by Adobe Reader 8. The ip address is 143.167.7.247, and once again, jetdirect will serve you well. The printer model is HP LaserJet 8150 PS.
There is also a combined printer/scanner/photocopier in e18a. The model is Samsung SCX-4825. Mac software is here. Note that the installation requires exit of all applications and may require a reboot. The ip address is 143.167.6.28 and again JetDirect will serve you well. The printer in D36 is similar (SCX-4828) and has ip address 143.167.4.137.
The printer in D38 is also network capable and can be printed to directly. It is a Brother HL-3040CN. A mac should automatically find and install the needed drivers. The ip address is 143.167.4.12 and JetDirect is, once again, recommended.
Printing from Windows
An easy way to print from windows is using the Mac method.
Install Bonjour for Windows. Download here. This will add an item 'Bonjour Printer Wizard' to the start menu. This wizard should find all the hep0 printers advertised as 'NAME @ HEP'. It should not be necessary to install drivers, as the conversion from Postscript to the proper language for the printer is done by the attached Linux machine.
The Phaser is a special case in that is has an independent network connection and you can print to it directly. First download and install this driver.
In the add printer wizard, you need to specify 'Local' printer (printer not attached to another windows machine) and define a TCP/IP port to 143.167.7.207. When the driver dialog comes up, hit 'Have Disk' and navigate to the directory in C:\WINDOWS\Xerox where the driver inf file was dropped. After installation, you can go into the advanced properties in the usual way and enable double-sided printing.
The same is true of the departmental printer. The ip address is 143.167.7.247 and the model is HP LaserJet 8150 PS. As mentioned above in the mac description, printing to the departmental printer directly has disadvantages.
There is also a combined printer/scanner/photocopier in e18a. The model is Samsung SCX-4825. Windows software is here. Note that the installation requires exit of all applications and may require a reboot. The ip address is 143.167.6.28 and again JetDirect will serve you well. The printer in D36 is similar (SCX-4828) and has ip address 143.167.4.137.
The printer in D38 is also network capable and can be printed to directly. It is a Brother HL-3040CN. The Windows 7 driver is here. The ip address is 143.167.4.12 and JetDirect is, once again, recommended.
A Beginner's Guide to Linux and the 'HEP' Cluster
The first thing to understand is that Linux is a command line based operating system. Whilst many common operations can be achieved by clicking icons on the desktop, this is not where the power resides.
Since the graphical desktop is rather straightforward to use, I will confine myself to important command line operations.
Getting Started
On the interactive 'HEP' cluster, you will find yourself staring at a graphical login prompt
Enter your username and password. You may also like to take a look at the 'Session type' drop-down list.
There are 2 standard desktop managers to choose from, KDE and Gnome. They are rather similar in operation, but people tend to develop a preference. The default is Gnome. I suggest you give them both a try
Gnome desktop
KDE desktop
Whilst the desktop manager will always be running in normal operation, the real power of Linux is the command line. The first thing to do it start a terminal. There will be a button in the menu (bottom left button) and various other shortcuts to launching a terminal, but the easiest way is to right-click on the desktop and find the item on the pop-up menu which launches the terminal.
In addition to the terminal button, you will find many other application buttons for applications, utilities, system administration (don't touch) and so on. It's best not to depend on these because they all have their limits. In particular, I don't want you to get too fond of any file system managers you might find, since file management is, like everything else, much better done from the command line.
KDE Terminal
Basic Commands
Most of the programs under Linux are simple command line. To find out how to use a command, bring up the man page. e.g. man cp will tell you how to use the cp command to copy files within the filesystem
pwd: This command tells you where you are (what directory) in the filesystem. When you first log in, you'll find yourself in your home area. This is where you should keep day to day files.
cd: This is the command to change directory. To change into a subdirectory of your current location: cd mydir. To ascend one level in the directory structure: cd ... You may also specify a full path: cd /home/bob/tmp/test. You'll get used to it.
ls: List the contents of the current directory. Most useful. This command has several useful options. ls -l will show details of each item in the directory. ls -A shows files and directories starting with . which are normally hidden. ls -rt shows files is reverse order of modification time (newest at the bottom). This is one command it's worth playing with and getting to know.
cp: This command copies files. You can specify one or more files or directories to be copied, the final argument is the destination. If you specify a directory which already exists as the destination, all files will be copied into that directory, using the same names as the original files. Note that if you specify more than one file or directory to copy, the destination must be an existing directory. Note that . indicates the current directory, and .. indicates one directory up. Filesystem commands tend to respond to the -r (recursive) flag which causes the command to descend into directories. This is often appropriate.
mv: Functions in much the same way as cp except that the original files are not preserved. This is also the normal way to rename files. If the new location for the files is on the same filesystem, the command is very quick, because the files are simply renamed and not copied then deleted.
rm: Remove file. Very dangerous command as files cannot be restored once removed. With the -r flag, especially dangerous. For example, rm -r * can promptly remove every file you own.
mkdir: create a directory with the specified name
tar: Possibly the most useful program ever devised. Rather similar to (and predating) windows zip programs. It creates one large file containing several smaller files and/or directories. It is common for software to be distributed in the form of tar files. A tar may be compressed with the standard Linux compression tool gzip. Such compressed tars are often named file.tar.gz or file.tgz. To create a gzipped tar file use tar cvzf archive.tar.gz files. To extract, use tar xvzf archive.tar.gz. There are many options for tar which are listed in the man pages.
There are many other commands, hundreds in fact. If you come across a task and you don't know which is the proper program to use, please ask. There are programs for everything, including difficult jobs like drawing Feynmann diagrams.Web and Mail
You're not going to get far without a web browser, so let's start there. Type firefox on your terminal. This will bring up the standard web-browser, Firefox. If you fancy something different, there is also Konqueror (which I'm not impressed with), Or more usefully Opera which is pretty nice and has e-mail and various other functionality thrown in.
There is no official build of Google Chrome for Scientific Linux 5 and no safe third party build I have been able to find.
The university will have created an e-mail account for you which is available via webmail and imap. The webmail (either via the Portal or Google) is convenient because it requires no set-up, but it is also slow and can be unreliable.
More popular long-term is thunderbird in imap mode. This allows you to access the mail directly on the server from an number of computers, just as with webmail, but the work is done by your computer. Instructions are available on the cics web-pages. For laptops or home machines, you'll want to follow the instructions for "Sending Mail Securely" and "Receiving Mail Securely" or it won't work.
Remote Login
Once you have an account on the system, you have access to the entire cluster. It is possible to log in from one machine to another by using ssh and specifying the hostname. e.g. ssh hep8 will (after prompting for a password) log your terminal into hep8. If you then run an application (you could try firefox again), it will run on hep8 but its window (if it has one) will be directed back to the machine at which you sit. This is called X-window forwarding and can be extremely useful
To log in to another machine or to log into hep0 from outside a similar command is used. In this case, you should specify the fully qualified domain name (fqdn) and, if it differs from their username where you are, your username. e.g. ssh bob@hep0.shef.ac.uk. Use this command to access your files when you're not at the university.
If you need to transfer files, use scp
Frequently Asked Questions
Firefox won't start / bookmarks gone
If firefox dies or a crash occurs while it is in use, it gets very upset. You may see this window when you try to start it.
If you have firefox running on another cluster machine, then you should quit it. Most likely, this is not the case and firefox has simply become confused.
In a terminal, cd to '~/.mozilla/firefox/*.*', this is your profile directory and contains all the firefox settings etc. Using rm, delete the files 'lock' and '.parentlock'. This is enough to get firefox running, but not enough to properly fix it. You might try starting firefox at this point, but most likely you will find that your bookmarks are gone, your 'Back' button doesn't work and the thing is generally crippled. Quit firefox and in the same directory as before remove the files 'places.sqlite' and 'places.sqlite-journal' (the latter is usually there, but not always). Firefox should then go back to working properly.
Evo won't run
Evo often refuses, or fails to run; 2 minutes before your conference is due to start. The error message it produces is either misleading or flat-out false.
Usually this is down to Evo fighting with Java. It can often be solved by clearing the java webstart cache, which forces the system to download Evo from scratch again: http://evo.caltech.edu/evoGate/Documentation/faq/uninstall/index.html
Sometimes it is also necessary to remove the directory ~/.Koala which holds your Evo settings. Note that this will cause it to forget your password.
