Unylogix HSM - Hierarchical Storage Migration software

Overview, Features and Benefits


The Beginnings of HSM

HSM technology is not new. It has been around in mainframe environments since the 1970s, where the management of large, centralized data repositories was a major problem. (Imagine, if you will, the size of the disk farms that had to exist on these computing mammoths; the data of a full enterprise!) At that time, disk storage space was very expensive, and space was at a premium. Accordingly, HSM was developed to automate the "freeing up" of disk space, so that files (data, records, etc.) would be "retired" to other storage technologies for longer term holding, primarily reel to reel magnetic tape. The HSM software, through its own decision process, made the selection as to which pieces of data had become inactive, and automatically migrated that data out to the tape transports.

But HSM takes data migration to a more intelligent level. It has multiple parameters for migration, meaning that it considers many factors in deciding which pieces of information are the best candidates for migration. This creates more thoughtful, accurate migration. HSM is a true rules-based software tool that was one of the first to have a graphical user interface, making it one of the most robust HSMs on the market.

HSM automates the storage management of all data under its control by monitoring the hard disk space usage. Based upon the disk usage when measured against predefined water-level marks, the HSM software will scan the contents of the disk, and select candidates among that data for migration out to another storage medium. When the space is finally needed, the HSM software will push the data out so that the disk does not fill up. At the same time, an entry is made in the HSM's data base as to the actual location of the data, and a stub (placeholder) is left behind on the disk layer, so that it gives the appearance to the users that their files are indeed on-line, (on the hard disk) whereas, in reality, they may be near-line (stored in a jukebox) or off-line on a shelf.

All of this is transparent to the user, as when they or their applications request a file, it is fetched by the HSM software, and provided for use. This process occurs seamlessly, without user or operator intervention. The data, in general, is migrated upward to the disk layer, where it remains available for use (refreshes, appending, etc), until it falls into disuse and again becomes a candidate for migration. Again, this process is all automatic, based upon preset parameters selected by the system administrator. All the user sees, if at all, is a slight time delay in the provision of data by the HSM system.

How the HSM Prioritizes Data

Unylogix HSM is a highly intelligent, robust software tool because it uses four parameters used for the selection of candidates for migration: these are size of file, date last accessed, date last modified and priority level. Priority is a number assigned to a file or directory that increases or decreases the probability of its remaining on the hard disk level. It is a parameter adjustable only by the super user so that he can "lock" data on the hard disk layer that he does not wish to have migrated outward.

Unlike some HSM packages, HSM is a true rules-based software tool that allows you to extend disk storage to huge capacities without requiring any intervention. It is used in conjunction with optical and tape jukeboxes, either together or alone. Better HSM packages, like ours, have the ability to back themselves up, independent of the network. It makes the disk self-grooming and keeps track of every piece of data it ever touches. And, best of all, it is seamless to the users.

Different Uses of HSM

As a Buffer

As all reads and writes to an HSM-controlled disk are through the hard disk layer, one can do a "disk to disk" backup of all the disks on a network, rather than to a tape device, which may be slower. As the HSM ensures that the receiving disk never fills up, backups can be expedited by the improvement in speed across the network.

As a Self-Managing Repository

When a network has a large number of users who have a lot of local disk space, the network administrator is burdened with keeping full and incremental backups on a network that is too slow to allow the backups in the time windows allotted. One solution, then, is to reduce the amount of data out on users' workstations, and have them place the data on the HSM server, where the software can do backups of its own contents, independent of the network and independent of the network backup package. Let's say a network has 60 GB of data spread across an Ethernet network's disks, which creates a formidable problem for any time window to perform backups. If the users agree that they would place 40 GB of the data on the HSM server, then the network back-ups would only have to worry about 20 GB of storage, as the HSM would handle the other 40 GB, automatically and independently of the network.

As a Restore Server

A network administrator could closely link the network backup and the HSM packages so that they both spoke the same language (file system), so the backup package could make use of the infinite capacity provided by the HSM. That way, all of the backups would be accounted for and on-line. In the event of a disaster, users could restore their own files to their disks from the most recent backup event, without operator intervention. All of the data would be there; all the user has to do is request the information and it would be copied over the network automatically - no searching for files or pieces of media (tapes).

Performance of HSM Systems

All other things being equal, the performance of the all HSM systems may be slower than a server that uses only hard disks as its data holder; but only on reads, not on writes. The HSM will hold a lot more data, at a lower cost per MB than the all-hard-disk alternative, and it will cost far less to administer. But it will be slower on reads if the data requested is not on the hard disk layer. If you are doing a dir or ls or checking for folders, all of those files look as if they are on-line, so the response time on these is the same as the all-disk server; of course, if the file is already on the hard disk layer, then the performance for the reading of the data is the same. As tape devices are serial in nature (not random access like hard disk drives), they experience a little more delay in finding a file than just the delay caused by the robotics intervention. Optical media, though, will have less delay because it is random access.



HSM Tech Specs

Solaris Version

Supported Platforms

Solaris (2.5, 2.6, 7, and 8) on both Sparc and Intel platforms, including 64-bit Sparc

Hardware Compatibility

Tested and certified with libraries or jukeboxes from all major manufactures, including STK, Quantum ATL, Qualstar, Overland, HDS, SONY, Plasmon etc.
Can handle libraries for all standard tape media, including: DLT, LTO, Exabyte, AIT, DAT, etc.
Can handle libraries for all standard optical media including: MO, CDR, CDRW, DVD, etc.

File System Support

Supports both UFS and NFS


Supported Devices

The following is a list of manufacturers whose robotic devices are supported. The list is constantly changing, so if you do not find your device, please and we will let you know if it is supported.

Optical Jukeboxes
DISC Maxoptix
Fujitsu Plasmon
HP Sony
IBM  

Tape Libraries
ADIC M4 Data Storage Tek
ATL Overland Data StraightLine
Breece Hill Technologies Plasmon Sun
Exabyte QualStar Tandberg Data
IBM Spectra Logic  

 

For more information click on Overview, Features and Benefits