NCBI
Home
Up
Power Index

"Power Users"

To conduct bioinformatic analyses on a larger than one-gene-at-a-time basis, one must begin to really automate and choreograph analyses.  UNIX and GCG are just an example of programs that one might wish to control.  When you really begin to choreograph these analyses, you are ready to be christened a "power user".

  1. Redirecting input and output
  2. Controlling execution

    1.     Multiple files

    2. "At" commands

  3. Databases and SQL

  4. Workflows


Redirecting input and output

UNIX programs have three read/write aspects: 

  • Standard input (can be a file, info from the terminal, or ouptut from a previous process)
  • Standard output (to terminal, printer, file, or to another process (often called "piping" when standard output of 1 program becomes the standard input of another)
  • Standard error (information about error messages that can also go to a terminal, file, or process.  Not relevant to our error-free world.)
TASK EXAMPLE ACTION
Redirect standard input sort <myfile.txt sort will use myfile.txt as input and sort the file to the terminal (standard output)
Redirect standard output sort <myfile.txt > mysortedfile.txt modifies standard output to a file
Append information sort <myfile.txt >> appendfile.txt appends the output of this sort to a file-does not overwrite
Pipe processes sort <myfile.txt | more pipes the output of the sort to the program more for page-by-page viewing

GCG programs offer input and output "flags" (those little -plo=mygraphic.gif "thingies") that also redirect input and output.  We will use some of these directions in an example.


Controlling execution

One can work on multiple files in at least two ways.  Probably the simplest manner is to use a wildcard character.

sort *.txt will sort all files in the current directory that are "anything".txt to the terminal.  UNIX uses a set of other wildcards-to be provided later.

The GCG-specific method is to use listfiles, which are lists of sequence IDs.   This is a powerful method of both creating lists of files and of manipulating them.   Listfiles are generated in two ways:  GCG programs may generate lists and YOU can use a text editor to make a listfile.

GCG programs that generate listfiles are:

  • Assemble
  • BLAST
  • Corrupt
  • FastA, TFastA
  • FindPatterns
  • FrameSearch
  • FromEMBL.FromGenbank,FromIG,FromPIR
  • LineUp
  • LookUp
  • Motifs
  • names
  • Pretty
  • ProfileSearch
  • Reformat
  • Sample
  • Simplify
  • Translate
  • WordSearch

To edit a list file, use your favorite word processor to make a file like the following:

Type in a description of the file.  This is arbitrary.   You can have single sequences of your own, database sequences, and other lists in your list.
..
/staben/myseqs/seq1.seq   
(a specific sequence in my directory)
gb_hum:*                    
(a set of all human Genbank sequences-only we don't have the current version of Genbank)
!gb_ov:chkhsp                
(this set of sequences is commented out, and won't be used here)

Be sure to save the file and upload it.

You can then use this in GCG programs with command like:

map @mylist.seq

The @ character tells GCG this is a list file.

The other way of controlling execution is to use the "at" function of UNIX to execute commands only when certain conditions (time, etc) are true.  This can be very valuable for a number of reasons.  A subset of this is a "batch" file, which is a simple set of commands.  We will write a batch file in class.


 

 

University of KentuckyMorgan School of Biological SciencesNSF-CCD Support wpe1.jpg (5798 bytes)Chuck Staben, copyright reserved || 12/04/98