Introduction
From CANFAR
The Canadian Advanced Network for Astronomical Research (CANFAR) is a computing infrastructure for astronomers. CANFAR aims to provide to its users easy access to very large resources for both storage and processing, using a cloud based framework. CANFAR allows astronomers to run processing jobs on a set of computing clusters, and to store data at a set of data centres.Help on getting a CANFAR account is available from the canfarhelp@nrc.gc.ca. These wiki pages provide help on getting started on using CANFAR. If you are a CANFAR user and want to make changes to this manual you can! just log in to the wiki using your CADC username and password. You must be a CANFAR member to get access to this system.
Imagine you have a processing intensive analysis that must be run on imaging data as observations are acquired, but once the processing is completed your computing needs decrease. The CANFAR system allows you to create a computing environment (a Virtual Machine) that we will deploy on a computer cluster. The number of instances of your VM that will be started grows to meet your current need. When the computing is done, the VMs are shutdown and the computing resources are freed for other users. The VM approach ensures that the software you want to run is available regardless of the grid or cluster installation where your analysis is conducted.
A major advantage of grid computing comes when the networks between the computing power and the storage are sufficiently high capacity that the large volume (in Gbytes) of data used during many astronomy processing steps will not be a bottle neck. Tuning networks between the astronomers desktop and the storage facility can be an intractable problem. Tuning the network between the computing grid and the data storage centre, while still a challenge, is a more solvable.
There are two main areas to understand before you starting using CANFAR. Processing and Storage, see below for details.
Processing
Processing data on CANFAR is done in 2 steps: configuration and the processing itself.
- Configuration: It involves creating a computing environment that will be used for the processing: a Virtual Machine (VM). The VM includes all the required software to run the processing jobs. There are two guides describing how to create and manage a VM:
- The Quick Start Configuration guide covers CANFAR configuration in a non-verbose manner.
- The In Depth Configuration guide goes more deeply into the internals of VM configuration and management
Originally, configuration was done using the prototype configuration system. Links to the prototype documentation are available here:
- The Quick Start Configuration Prototype guide covers CANFAR configuration in a non-verbose manner.
- The In Depth Configuration Prototype guide goes more deeply into the internals of VM configuration and management
- Processing: The next step is to actually process the data using the newly created Virtual Machine. The VM will be booted and the software that has been installed on the VM will run on the clusters and will be shared with other users. Each processing job is in batch and needs to be queued.
- The Quick Start Processing guide shows how to submit jobs on CANFAR in a few quick steps.
- The In Depth Processing guide is an attempt to cover more subtleties of the CANFAR processing.
VOSpace Storage
CANFAR uses a network storage solution called VOSpace (from the Virtual Observatory standard), and maintains the community-supported VOSpace User Guide. The processing results can optionally be saved in VOSpace. VOSpace also allows users to share data with others, either publicly or with a restricted set of other users.
- The Quick Start VOSpace guide gives the minimal knowledge to manage a VOSpace storage space.
- The In Depth VOSpace guide details more usage and features of VOSpace within CANFAR.