- The LIM on each server host monitors its host’s load and exchanges load information with other LIMs. The LIM on one host in the cluster acts as the master, collects information for all hosts and provides that information to the applications.
For further assistance, please contact helpdesk@serc.iisc.in by E-mail or contact system administrators in SERC#109.
LSF stands for LOAD SHARING FACILITY. LSF manages, monitors, and analyzes the workload for a heterogeneous network of computers and it unites a group of computers into a single system to make better use of the resources on a network. Hosts from various vendors can be integrated into a seamless system. LSF is based on clusters. A cluster is a group of hosts. The clusters are configured in such a way that LSF uses some of the hosts in the cluster as batch server hosts and some others as client hosts. In SERC, LSF has been loaded on CompaqAlphaServer ES40 systems . |
||
Configuration Information of LSF at SERC
|
||
LSF Version : LSF 4.1 lsf-common-compalpes40: Includes four COMPAQ AlphaServer ES40 systems. To use LSF on the Compaq AlphaServer ES40 systems, users have to logon to server: alphas4 and then submit jobs through LSF. The paths to be included to access the binaries and the man pages for LSF on Compaq Alpha ES40 systems are : PATH : /usr/lsf4.1/bin MANPATH : /usr/lsf4.1/mnt/man. |
||
Queue Information
|
||
Jobs are submitted through queues. The queues configured on Compaq Alpha Server ES40 are:
8hr : Jobs that require 8 hours or less of CPU time can be submitted to this queue. |
||
LSF Tools
|
||
LSF provides a set of tools for users to get information about the system.
|
||
Some Basic LSF Commands
|
||
bhosts
bsub “<job to be submitted>”
bsub -q <name of the queue> “<job to be submitted>”
bqueues
bjobs
bjobs -a
bkill <JOBID>
|
||
User Manuals
|
||
|
||
Detailed Information on LSF
|
||
LSF stands for LOAD SHARING FACILITY. LSF requires a UNIX operating system with Internet Protocol (IP) networking. It is a general purpose distributed computing system. LSF is a suite of workload management products. LSF manages, monitors, and analyses the workload for a heterogeneous network of computers. It unites a group of computers into a single system to make better use of the resources on a network. Load sharing in LSF is based on clusters. A cluster is a group of hosts that provide shared computing resources. A cluster can contain a mixture of host types. Each cluster has at least one LSF administrator who has permission to change the LSF configuration and perform other maintenance functions. LSF allows the user to use these hosts transparently, so applications that run on only one host type are available to the entire cluster. It is designed for networks where all hosts have shared file systems. But it can be used even in networks without file sharing but with less fault tolerance capabilities. LSF can automatically select hosts in a heterogeneous environment based on the current load conditions and the resource requirements of the applications. LSF can run batch jobs automatically when required resources become available, or when systems are lightly loaded. LSF maintains full control over the jobs, including the ability to suspend and resume the jobs based on load conditions. LSF supports sequential and parallel applications running as batch jobs. It allows new distributed applications to be developed through C program library and a tool kit of programs for writing shell scripts. LSF treats each UNIX process queue as a separate machine. A multiprocessor computer with a single process queue is considered a single machine. A box full of processors that each have their own process queues is treated as a group of separate machines. LSF allows fair share policies to be defined at the queue level so that different queues may have different sharing policies. The policy applies to all hosts used bythe queue. Fair share scheduling is an alternative to the default first-come- first-serve scheduling. This divides the processing power of the LSF cluster among users and groups to provide access to resources for all jobs in a queue. Most applications can use the load sharing utilities to access LSF. They do not communicate directly with LSF and do not need to be modified to work with LSF. Nearly all UNIX commands and third party applications can be load shared using LSF utilities. With LSF users can do their jobs and leave the system to find the best host to run their programs. Users are no longer limited to the resources on their own workstations. Users only need to learn a few simple commands to have the resources of the entire network within their reach, even without rewriting or changing their programs. Users can transparently run software that is not available on their local hosts. For example, a CAD tool available on a HP host can be run by a user on a SUN workstation without any difficulty. Users can write their own load sharing applications, both as shell scripts using the lstools programs and as compiled programs using the LSF application programming libraries. LSF provides comprehensive resource and load information about all hosts in the network. Resource Information:
Dynamic Load Information:
LSF divides jobs into two kinds – interactive and batch.
LSF has a number of features to support fault tolerance. It is designed to continue operating even if some of the hosts in the cluster are unavailable. LSF services are available as long as any host in the cluster is up. When a host crashes, all jobs running on that host are lost but no other jobs are affected. However, when it comes up again the jobs that were running are assumed to have exited and an email is sent to the user, but the pending jobs remain as they are and are scheduled as hosts become available. Important jobs can be submitted to lsbatch with an option to automatically restart if the job is lost because of host failure.
A server host is a host that runs load-shared jobs. The Load Information Manager(LIM) runs on every server host. The LIM interfaces directly with the underlying operating systems and provide users with a uniform, host independent environment. |