UK Home Academics Athletics Chandler Medical Center Research and Graduate Studies Site Index Search UK University of Kentucky


File Archive Storage System FAQ



Welcome to the File Archive Frequently Asked Questions (FAQ) page. Questions and comments about this FAQ should be mailed to help@hsm.uky.edu

More Advanced Topics


What you need to know to get started.

What is the File Archival system? It is a file manager, currently Tivoli Space Manager. It can move files between disk and tape, as well as other media, automatically per defined rules regarding how old a file is, how much free disk space is required to be available, how long since the last time new files were moved off of disk, etc. It works in conjunction with TSM (Tivoli Storage Manager) to store data (sftp'd or scp'd to the server) on tape.

The basics of how File Archival works are as follows. There are three parts of the process. One is you. You either put a file in the system, or request to get one back from the system. Next is the place that you either put a file, or the system places the file for you to retrieve. This is called the disk cache. Think of it as a large temporary working area. All files must pass through the disk cache on their way to tape, or on their way back to the user from tape. The disk cache is the limiting factor in how big a file can be. Files can be written over many tapes, but the whole file must fit on the disk cache. If this area is full, no new files can be created, and no old ones can be brought back from tape. The last component is the permanent storage, which in our case is tape located in robotic tape libraries. There may be various levels (fast tape for primary, slower tape for offsite, very high density for archives, etc), and different types of storage (tape for primary, optical for extremely long term, etc).

How the File Archive system makes this all work is like this: first a user put new files into the disk cache, generally by SFTP. Periodically the software writes these files to tape, and if there is enough free space on the disk cache, leaves the files on disk as well. As the disk cache starts to fill up, and passes a preset level, the tape archive software removes older, larger files from the disk cache that have been written to tape. For files that you have requested to be retrieved, the system finds the tape, requests it be mounted and then reads your file back to the disk cache so you can have access to it. During this process, if the disk cache gets too full, older, larger files are once again removed to make room for your file being brought back. Once your file is back on disk you can access it.

Files in the File Archival sytem can be accessed in various ways. The main way is via SFTP.

If you have basic knowledge of FTP commands in order to log on, get or put a file, change directories, and log off, you can use the File Archival system. There are performance considerations that can make your use of the system easier and more efficient, but the basic commands will get you started.

One thing we ask is that you do not place a lot of small files (less than 200 KBytes in size) in the File Archival System. It is wasteful of tape space, and processing time,and can become and organizational nightmare for you one day. If you have lots of small files that constitute a project, you should first create an archive with a utility such as tar or winzip on YOUR system. Then FTP that archive to the File Archival System.

WARNING: Be very careful deleting files from the File Archival System. Files are stored on tape, and provisions are taken to provide recovery in case of system failure, or disaster, but recovering a file deleted by a user is a somewhat different case. In general, user deleted files can be recovered for seven (7) days. After that period they ARE GONE.

Terms and Definitions.

Your userid, your access to the File Archive system.

To access files in the File Archive system you need a userid. This userid can be requested from the Information Technology Customer Service Center in McVey Hall.

The userid will be the same as your userid on other IT maintained systems, such as a campus email account. Your password is the same default password as for these other systems. You must change your password the first time you log on. This account is NOT your linkblue account; you have to maintain your passord seperately

To grant yourself initial access to the tape archives, and to do other things such as change your passwword, you need to ssh to the File Archival server (hsm.uky.edu). Log in with your userid and default password, and follow the directions. For a new userid you MUST set an email address so we can reach you. After that you should change your password. If you need to change your email address or you want to change your password again, ssh back to the archive and follow the instructions.

Note that email addresses will be verified from time to time, and if yours does not exist/work, your userid will be suspended until you update your email address.

There are no storage quotas, or other limitations on your File Archival userid at this time.

Displaying your data, and what does it all mean.

To display files in the FTP archive you use the same commands you can use elsewhere: ls or dir.

The information you see will be very similar to FTP elsewhere.

Permissions/security of your data.

Most unix type commands work the same with the File Archival SFTP server; for example chmod. Note however that most users do not have permission to change the owner, nor do they have alternate groups to change to. For either of these situations you should send mail to request ownership changes, or a new group for users to share data with.

If you would like to set up permission defaults for your current session you can use the umask command, just as you would for an interactive Unix session. Remember that umask takes away the permissions you set with it.

The combinations are built from the following:

                0400  ( a=rwx,u-r)  Read by owner
                0200  ( a=rwx,u-w)  Write by owner
                0100  ( a=rwx,u-x)  Execute (search in directory) by owner
                0040  ( a=rwx,g-r)  Read by group
                0020  ( a=rwx,g-w)  Write by group
                0010  ( a=rwx,g-x)  Execute/search by group
                0004  ( a=rwx,o-r)  Read by others
                0002  ( a=rwx,o-w)  Write by others
                0001  ( a=rwx,o-x)  Execute/search by others

So, to give the owner all rights, and no one else any would be umask 077. To give everyone all access is 000. And of course to give no access to anyone is 777.

Here are the help listings as they existed on Oct 4, 2006. These are subject to change from revision to revision!

sftp> help
Available commands:
cd path                       Change remote directory to 'path'
lcd path                      Change local directory to 'path'
chgrp grp path                Change group of file 'path' to 'grp'
chmod mode path               Change permissions of file 'path' to 'mode'
chown own path                Change owner of file 'path' to 'own'
help                          Display this help text
get remote-path [local-path]  Download file
lls [ls-options [path]]       Display local directory listing
ln oldpath newpath            Symlink remote file
lmkdir path                   Create local directory
lpwd                          Print local working directory
ls [path]                     Display remote directory listing
lumask umask                  Set local umask to 'umask'
mkdir path                    Create remote directory
progress                      Toggle display of progress meter
put local-path [remote-path]  Upload file
pwd                           Display remote working directory
exit                          Quit sftp
quit                          Quit sftp
rename oldpath newpath        Rename remote file
rmdir path                    Remove remote directory
rm path                       Delete remote file
symlink oldpath newpath       Symlink remote file
version                       Show SFTP version
!command                      Execute 'command' in local shell
!                             Escape to local shell
?   

Manipulating files.

The basic FTP commands, get (to get a file from the archive) and put (to put or store a file in the archive), together with many other simple ftp commands are available in sftp, but the more advanced commands such as mget and mput are not available. If you want to send or retrieve multiple files in one command, consider using scp.

Batch use of the File Archive system.

Many users run large (long) jobs on other systems that require cycling through many data files. These files, either individually, or in combination, may require more disk space on the system running the job, than is available.

One easy way to avoid this problem is, if possible, only get files needed for the current execution of the program, run the job, remove those files, get the next set and repeat.

Example:

The job MYJOB needs three input files to process a months worth of data. Each of these files is 1 GB (1,000,000,000 bytes) in size, and you want to run the past 5 years worth. That is going to be 180 GBs of data. Your likely to upset a system administrator somewhere if you try to do that all at once.

Instead, wrap your executable in a script that first get the 3 files you need for the first months processing, executes the job, removes those 3 files, and gets the next three. This can continue until you have done all 5 years.


Common Problems and Pitfalls.

As with most computer systems, problems can be grouped into one of the following types:

We will try to address some of the more common items we have seen over the past decade of file management operations.


Q:I get a "I/O" error msg...what is that?
A:You can get this type of error when the File Archival software is unable to fulfill a request to bring a file back from tape. This does not mean your file is damaged, only that the system cannot access it at this time. This could be a system problem (or the system could be very busy) you should check the servers status (see below) and report it to help@hsm.uky.edu if you see that the system is fairly quiet.


Q:I asked the system to get a file, but it did not, and I saw no error msg, why?
A1:More than likely the system is very busy and you have not waited long enough for your request to be fulfilled.

A2:If you have waited a long time, say over night, you may have waited too long! If your file was brought back from tape, then the system had a lot of work to do and needed to clean out the disk cache to make room for new files, your file may have been purged from the disk cache in order to make room.

A3:Another possibility is a system problem occurred which required restarting Tivoli Storage Manager or the system. In this case all outstanding requests for files to be staged back from tape are lost and must be requested again. This condition, as well as a busy system that does not respond quickly can be determined on the status web page. (see below)


Q:Why are the permissions not the same for a file I downloaded from the File Archival system?
A:Remember, the File Archival system is just a large SFTP server as far as your local system is concerned. It works just like any other SFTP process, and uses permission settings on the system where you issued the SFTP/GET session from.

Miscellaneous Questions.

Q:How do I bring data to the File Archival site? How do I send data elsewhere?
A:The simplest way to move data to or from our site would be using FTP. However, for very large data files this may be impractical. Contacting the department on campus that you will be working with to find out what type of media they support (DAT, CDROM, 8mm, etc.) is the only way to be assured that what you bring can be read. Then the data can be SFTP'd to the File Archival system. The same is true for sending data offsite. Find out what the other site can read, and what you can write, then SFTP data to the system with that type of media and crate your tape or other media.

Q:Why can't I just take the tape that has my data on it?
A:The system doesn't store a single persons data on a tape by itself. Files from all types of sources and various people may be on a single tape. If your data is large, it may be spread over multiple tapes, with another persons data on the first and last tapes as well. In addition, the files are stored on the tape using an internal id rather than your filename, so that the HSM database is required to identify which file is wanted. Finally, tapes are known to the database and must be accounted for.

Q:Can I make my own tape in the tape library and then take it with me?
A:No. Users do not have that type of access, however, an administrator may be able to assist you with this request. Contact us for more information.

Q:Should I compress my data or not?
A:Compressing your data is not absolutely required, however with todays fast computers compressing data can be a great benefit as you won't have to FTP as much data to the server and will thus speed up the transfer process. In general, don't compress small files - but it is strongly reccomended that you create tar or zip archives of many small files rather than store files smaller than 10 MB on the archive.

Q:How can I move all my files to another directory?
A:There is no way to move an entire directory of files to another location with SFTP. You can, however, move them one at a time using the rename command (see above).

File Archival Tricks.

Q:How can I keep the date/time the same in the File Archival system as where the file came from?
A:Since the File Archival system looks like a large SFTP server to your system, files SFTP'd to it are considered "new" and take on the current date/time for the time created or modified. In order to keep the "real" date/time of your file, you will have to create an archive, using something like tar or winzip, on YOUR system. After you create the archive, FTP the archive to the File Archival system. The archive will have the current date/time, but the files inside will maintain the proper date/time, as well as other properties such as group and owner.

Q:Is there any way to extract a single file from a tar archive without having to get the entire archive on my system first? What about getting a listing of files in a tar archive before I FTP it to my machine?

A:YES! There are many interesting things you can do if your system supports unix style pipes.

This will allow you to list files in a tar archive, named t.tar, that is in your top level File Archival directory.

'get t.tar "|tar -xf - " '

while in the SFTP session.

Status Web Page.

Lots of useful information about the current status of the data management systems, hsm.uky.edu (the FTP server) and backups.uky.edu (the tape storage server) can be found on the web page located at:

http://www.uky.edu/ComputingCenter/DataStorage/TSM/status.html

Here you can find things like:

which can help you determine if the system is busy (tape mount status), if there is a problem (no reply, or invalid reply to various queries), how busy the server is (hsm.uky.edu load average), and important information regarding downtime, or problems.

Performance.

Determining if you are getting the best performance of a system, can be a poor guess at best. There are no absolutes on performance when it comes to shared resources, other than the absolute best you could ever see (and probably will never see!).

A few things to consider are:

The above are, unfortunately, out of the users control. However there are a few things that users can do to help themselves, and decrease the time it takes to accomplish some tasks. These are:

Reporting Problems.

In the rare incident that you feel you have a problem and would like assistance, we ask that you first check the TSM status page. You may be able to determine that there IS a problem, or you may see a message explaining that we are working on the problem, or some other related issue.

If you still require help, we ask that you provide us with the following information in email to help@hsm.uky.edu.


Comments to and this page last updated on Jul 14, 2011, by: help@hsm.uky.edu
Back to: Comp.CenterComputing Center Comp.DataDataStorage