Welcome to the File Archive Frequently Asked Questions (FAQ) page. Questions and comments about this FAQ should be mailed to help@hsm.uky.edu
More Advanced Topics
What is the File Archival system? It is a file manager, currently Tivoli Space Manager. It can move files between disk and tape, as well as other media, automatically per defined rules regarding how old a file is, how much free disk space is required to be available, how long since the last time new files were moved off of disk, etc. It works in conjunction with TSM (Tivoli Storage Manager) to store data (sftp'd or scp'd to the server) on tape.
The basics of how File Archival works are as follows. There are three parts of the process. One is you. You either put a file in the system, or request to get one back from the system. Next is the place that you either put a file, or the system places the file for you to retrieve. This is called the disk cache. Think of it as a large temporary working area. All files must pass through the disk cache on their way to tape, or on their way back to the user from tape. The disk cache is the limiting factor in how big a file can be. Files can be written over many tapes, but the whole file must fit on the disk cache. If this area is full, no new files can be created, and no old ones can be brought back from tape. The last component is the permanent storage, which in our case is tape located in robotic tape libraries. There may be various levels (fast tape for primary, slower tape for offsite, very high density for archives, etc), and different types of storage (tape for primary, optical for extremely long term, etc).
How the File Archive system makes this all work is like this: first a user put new files into the disk cache, generally by SFTP. Periodically the software writes these files to tape, and if there is enough free space on the disk cache, leaves the files on disk as well. As the disk cache starts to fill up, and passes a preset level, the tape archive software removes older, larger files from the disk cache that have been written to tape. For files that you have requested to be retrieved, the system finds the tape, requests it be mounted and then reads your file back to the disk cache so you can have access to it. During this process, if the disk cache gets too full, older, larger files are once again removed to make room for your file being brought back. Once your file is back on disk you can access it.
Files in the File Archival sytem can be accessed in various ways. The main way is via SFTP.
If you have basic knowledge of FTP commands in order to log on, get or put a file, change directories, and log off, you can use the File Archival system. There are performance considerations that can make your use of the system easier and more efficient, but the basic commands will get you started.
One thing we ask is that you do not place a lot of small files (less than 200 KBytes in size) in the File Archival System. It is wasteful of tape space, and processing time,and can become and organizational nightmare for you one day. If you have lots of small files that constitute a project, you should first create an archive with a utility such as tar or winzip on YOUR system. Then FTP that archive to the File Archival System.
WARNING: Be very careful deleting files from the File Archival System. Files are stored on tape, and provisions are taken to provide recovery in case of system failure, or disaster, but recovering a file deleted by a user is a somewhat different case. In general, user deleted files can be recovered for seven (7) days. After that period they ARE GONE.
To access files in the File Archive system you need a userid. This userid can be requested from the Information Technology Customer Service Center in McVey Hall.
The userid will be the same as your userid on other IT maintained systems, such as a campus email account. Your password is the same default password as for these other systems. You must change your password the first time you log on. This account is NOT your linkblue account; you have to maintain your passord seperately
To grant yourself initial access to the tape archives, and to do other things such as change your passwword, you need to ssh to the File Archival server (hsm.uky.edu). Log in with your userid and default password, and follow the directions. For a new userid you MUST set an email address so we can reach you. After that you should change your password. If you need to change your email address or you want to change your password again, ssh back to the archive and follow the instructions.
Note that email addresses will be verified from time to time, and if yours does not exist/work, your userid will be suspended until you update your email address.
There are no storage quotas, or other limitations on your File Archival userid at this time.
To display files in the FTP archive you use the same commands you can use elsewhere: ls or dir.
The information you see will be very similar to FTP elsewhere.
Most unix type commands work the same with the File Archival SFTP server; for example chmod. Note however that most users do not have permission to change the owner, nor do they have alternate groups to change to. For either of these situations you should send mail to request ownership changes, or a new group for users to share data with.
If you would like to set up permission defaults for your current session you can use the umask command, just as you would for an interactive Unix session. Remember that umask takes away the permissions you set with it.
The combinations are built from the following:
0400 ( a=rwx,u-r) Read by owner
0200 ( a=rwx,u-w) Write by owner
0100 ( a=rwx,u-x) Execute (search in directory) by owner
0040 ( a=rwx,g-r) Read by group
0020 ( a=rwx,g-w) Write by group
0010 ( a=rwx,g-x) Execute/search by group
0004 ( a=rwx,o-r) Read by others
0002 ( a=rwx,o-w) Write by others
0001 ( a=rwx,o-x) Execute/search by others
So, to give the owner all rights, and no one else any would be umask 077. To give everyone all access is 000. And of course to give no access to anyone is 777.
Here are the help listings as they existed on Oct 4, 2006. These are subject to change from revision to revision!
sftp> help Available commands: cd path Change remote directory to 'path' lcd path Change local directory to 'path' chgrp grp path Change group of file 'path' to 'grp' chmod mode path Change permissions of file 'path' to 'mode' chown own path Change owner of file 'path' to 'own' help Display this help text get remote-path [local-path] Download file lls [ls-options [path]] Display local directory listing ln oldpath newpath Symlink remote file lmkdir path Create local directory lpwd Print local working directory ls [path] Display remote directory listing lumask umask Set local umask to 'umask' mkdir path Create remote directory progress Toggle display of progress meter put local-path [remote-path] Upload file pwd Display remote working directory exit Quit sftp quit Quit sftp rename oldpath newpath Rename remote file rmdir path Remove remote directory rm path Delete remote file symlink oldpath newpath Symlink remote file version Show SFTP version !command Execute 'command' in local shell ! Escape to local shell ?
The basic FTP commands, get (to get a file from the archive) and put (to put or store a file in the archive), together with many other simple ftp commands are available in sftp, but the more advanced commands such as mget and mput are not available. If you want to send or retrieve multiple files in one command, consider using scp.
Many users run large (long) jobs on other systems that require cycling through many data files. These files, either individually, or in combination, may require more disk space on the system running the job, than is available.
One easy way to avoid this problem is, if possible, only get files needed for the current execution of the program, run the job, remove those files, get the next set and repeat.
Example:
The job MYJOB needs three input files to process a months worth of data. Each of these files is 1 GB (1,000,000,000 bytes) in size, and you want to run the past 5 years worth. That is going to be 180 GBs of data. Your likely to upset a system administrator somewhere if you try to do that all at once.
Instead, wrap your executable in a script that first get the 3 files you need for the first months processing, executes the job, removes those 3 files, and gets the next three. This can continue until you have done all 5 years.
As with most computer systems, problems can be grouped into one of the following types:
We will try to address some of the more common items we have seen over the past decade of file management operations.
Q:I get a "I/O" error msg...what is that?
A:You can get this type of error when the File Archival software is
unable to fulfill a request to bring a file back from tape. This does not mean
your file is damaged, only that the system cannot access it at this time.
This could be a system problem (or the system could be very busy) you
should check the servers status (see below) and report it to help@hsm.uky.edu
if you see that the system is fairly quiet.
Q:I asked the system to get a file, but it did not, and I saw no error msg, why?
A1:More than likely the system is very busy and you have not waited long
enough for your request to be fulfilled.
A2:If you have waited a long time, say over night, you may have waited too long! If your file was brought back from tape, then the system had a lot of work to do and needed to clean out the disk cache to make room for new files, your file may have been purged from the disk cache in order to make room.
A3:Another possibility is a system problem occurred which required restarting Tivoli Storage Manager or the system. In this case all outstanding requests for files to be staged back from tape are lost and must be requested again. This condition, as well as a busy system that does not respond quickly can be determined on the status web page. (see below)
Q:Why are the permissions not the same for a file I downloaded from the File Archival system?
A:Remember, the File Archival system is just a large SFTP server as far as your local system is
concerned. It works just like any other SFTP process, and uses permission
settings on the system where you issued the SFTP/GET session from.
Q:How do I bring data to the File Archival site? How do I send data elsewhere?
A:The simplest way to move data to or from our site would be using FTP.
However, for very large data files this may be impractical. Contacting the
department on campus that you will be working with to find out what type of
media they support (DAT, CDROM, 8mm, etc.) is the only way to be assured
that what you bring can be read. Then the data can be SFTP'd to the File Archival system. The same
is true for sending data offsite. Find out what the other site can read, and
what you can write, then SFTP data to the system with that type of media and
crate your tape or other media.
Q:Why can't I just take the tape that has my data on it?
A:The system doesn't store a single persons data on a tape by itself. Files
from all types of sources and various people may be on a single tape. If
your data is large, it may be spread over multiple tapes, with another
persons data on the first and last tapes as well. In addition, the files are
stored on the tape using an internal id rather than your filename, so that the
HSM database is required to identify which file is wanted. Finally, tapes
are known to the database and must be accounted for.
Q:Can I make my own tape in the tape library and then take it with me?
A:No. Users do not have that type of access, however, an administrator
may be able to assist you with this request. Contact us for more information.
Q:Should I compress my data or not?
A:Compressing your data is not absolutely required, however with todays
fast computers compressing data can be a great benefit as you won't have to FTP
as much data to the server and will thus speed up the transfer process. In
general, don't compress small files - but it is strongly reccomended that you
create tar or zip archives of many small files rather than store files smaller
than 10 MB on the archive.
Q:How can I move all my files to another directory?
A:There is no way to move an entire directory of files to another
location with SFTP. You can, however, move them one at a time using the
rename command (see above).
Q:How can I keep the date/time the same in the File Archival system as where the file came
from?
A:Since the File Archival system looks like a large SFTP server to your system, files SFTP'd
to it are considered "new" and take on the current date/time for the time
created or modified. In order to keep the "real" date/time of your file,
you will have to create an archive, using something like tar or
winzip, on YOUR system. After you create the archive, FTP the archive to the File Archival system. The
archive will have the current date/time, but the files inside will maintain
the proper date/time, as well as other properties such as group and owner.
Q:Is there any way to extract a single file from a tar archive without having to get the entire archive on my system first? What about getting a listing of files in a tar archive before I FTP it to my machine?
A:YES! There are many interesting things you can do if your system supports unix style pipes.
This will allow you to list files in a tar archive, named t.tar, that is in your top level File Archival directory.
'get t.tar "|tar -xf - " '
while in the SFTP session.
Lots of useful information about the current status of the data management systems, hsm.uky.edu (the FTP server) and backups.uky.edu (the tape storage server) can be found on the web page located at:
http://www.uky.edu/ComputingCenter/DataStorage/TSM/status.html
Here you can find things like:
which can help you determine if the system is busy (tape mount status), if there is a problem (no reply, or invalid reply to various queries), how busy the server is (hsm.uky.edu load average), and important information regarding downtime, or problems.
Determining if you are getting the best performance of a system, can be a poor guess at best. There are no absolutes on performance when it comes to shared resources, other than the absolute best you could ever see (and probably will never see!).
A few things to consider are:
The above are, unfortunately, out of the users control. However there are a few things that users can do to help themselves, and decrease the time it takes to accomplish some tasks. These are:
In the rare incident that you feel you have a problem and would like assistance, we ask that you first check the TSM status page. You may be able to determine that there IS a problem, or you may see a message explaining that we are working on the problem, or some other related issue.
If you still require help, we ask that you provide us with the following information in email to help@hsm.uky.edu.
Computing Center
DataStorage