Navigation bar to College of Agriculture home page Graduate Study in Plant Pathology Search the University of Kentucky web sites Link to University of Kentucky home page
UK-AGTC Logo
       
  Home

Rates and Schedules

AGTC-LIMS Login

About LIMS

Equipment & Facilities

Standard
Operating Procedures


Support

Vendors & Links

UK Molecular Biology Network

Staff and Contact Info

  About LIMS

Introduction

The University of Kentucky Advanced Genetics Technologies Center (UK-AGTC) is a core-type facility which employs robotics and high-throughput tools in a mission of providing the most cost-effective facilitation of cutting-edge research. Laboratory database management suites enhance the ability to customize these processes and to store, organize, and then disseminate the large volume of data (460,800 bases of DNA per day) to their owners via web-based applications.
Examples of Current UK-AGTC sequencing projects include:

  • Genomes of forage and turf grass symbionts
  • Genomes of insect viruses
  • Gene expression profiles in horses
  • Genomes of plant-pathogenic fungi
  • Gene expression during plant development and reproduction
To meet the wide variety of UK faculty's DNA sequencing demands, AGTC has purchased six Beckman Coulter CEQ8000 Capillary Electrophoresis DNA sequencers. The CEQ's eight-capillary array electrophoreses each column of a 96-well plate independently, allowing for both a variety of run conditions on one plate and relatively swift throughput. The staff of AGTC have designed and implemented a custom LIMS for sequencing project sample and data handling. Samples and data are tracked through the analytical process by modeling the way the sample is handled in the lab from preparation to final results.

Every sample sequenced at AGTC is given a name that uniquely identifies the sample. These identifiers are according to a naming convention designed by AGTC-staff. To facilitate easy storage and retrieval of data, all the samples are categoried with a name of the project and the lab. Each lab is identified by the initials of the PI. For example, CLS is a unique identifier for Dr.Schardl's lab. Projects are further classified as Standard and Non-Standard projects.

Projects with a large number of samples to be assembled are categorized as Standard projects. Parameters such as the primers can be defined for standard projects. Whole genome, EST, BAC are some examples of standard projects. Naming convention for the samples in Standard projects is shown in the figure below.

Standard Projects Naming Convention
View image

For example, if a chromatogram file has a name CLS_Cp26A07_3a5_1_a04f_A02.ab1, following are the details of the sample:

  • Lab: CLS (Dr.Schardl's lab)
  • Project: Cp26A07
  • Reaction plate/ Sequenceing Plate: CLS_Cp26A07_3a5, Well: A02
  • DNA plate/Library plate: CLS_Cp26A07_3, Well: a04
  • Primer used for sequencing: f

Projects with a fewer number of samples where you can define details such as the dna plate, reaction plate, source etc are categorized as Non-Standard projects. Each sample in a Non-Standard project has four attributes: Source, Template, Primer and User Discretionary attributes.

  • Source: From where the sequencing template is derived. Eg: Source of DNA such as cosmid used as template for PCR or was ligated to vector to get a clone.
  • Template: template used for sequencing (Eg: Plasmid, PCR product)
  • Primer: Primer used in sequencing
  • User Discretionary Attributes: Any other details that the user likes to include for this sample
You can assemble a subset of sequences on the basis of these attributes. Naming convention of the samples is shown in figure below.

Non Standard Projects Naming Convention
View image

If a chromatogram file is named as CLS_167F1x53L23_1x1_1_167f_p53L23CE_M13R_x_JB2_B147.ab1, following are the details of the samples:

  • Lab: CLS (Dr.Schardl's Lab)
  • Project: 167F1x53L23
  • Reaction Plate/Sequencing plate: CLS_167F1x53L23_1x1_1, Well/tube number: JB2
  • DNA/Library Plate: CLS_167F1x53L23_1
  • Source: 167f
  • Template: p53L23CE
  • Primer: M13R
  • User Discretionary Attribute: x
  • Community Plate name: B147
In this poster, we focus on the use of novel aspects in the operating software of Applied Biosystems (ABI) that facilitates its integration into such a custom LIMS.

AGTC Sequence Data Pipeline
View image


Step 1:
Before a user physically submits a 96-well sample plate to the AGTC facility, he registers the plate(online) using the Plate Submission form. Users can create an Excel Template file with the help of which they can easily fill in the online plate submission form. Below is an example template file (for a non-standard project) and a description of each of the fields.

PIProjectRunsetnameRerunIDSourceTemplatePrimerUser DiscWellno./ Tubeno.
SRCD681x11pGEMControl   B01
SRCD681x11wt3IVSfxJMC1
SRCD681x11wt8IVSfxJMC2
SRCD681x11d1x4910IVSfxJMC3
SRCD681x11d1x4930IVSfxJMC4
SRCD681x11wt3MiCFxJMC5
SRCD681x11wt8MiCFxJMC6
SRCD681x11d1x4910Micd149fxJMC7
SRCD681x11d1x4930Micd149fxJMC8

PI: PI of your lab
Project: Whatever you want to call the project. You will be able to call up sequences by project on the database.
1x1: An identifier needed for the database. The x is our code indicating that applied biosystems sequencing chemistry, and that the sequencing is done bt the client (i.e. external to AGTC). If you submit sequences you have done with the Bechman-Coulter DTCS Kit this field should have the form 1y1, where the y indicates client prepared DTCS-Chemistry. Sequences prepared by AGTC staff will have different letters to indicate Applied Biosystems (a) or Beckman-Coulter (b) chemistry.
1: This is always 1.
Source, Template, Primer, UserDisc: These are the attributes defined for the sample as described above.
Well Number/ Tube Number: You will have to submit your samples to AGTC in a 96 well plate. This plate is more stable than individual tubes and provides adequate space for you to write the essential information such as the PI's initials, project name and the date. In this field you will provide the number of the well containing the sample. (A01, B01 etc) in your submitted plate. However, if you submit the samples in tubes, you provide the name or number that you printed on the tube.

Each sample is thus identified by all the above fields. For example, the chromatogram file corresponding to the sample in tube number JMC1 of the above template will be named as SR_CD68_1x1_1_wt_3_IVSf_x_JMC1_B122.ab1

On registering the plate through the online submission form, the user receives an identifier that is according to the naming convention described above. This identifier is used to keep track of the plate throughout the pipeline. The Plate Submission Form also provides the users the ability to choose different options that could be set on ABI and CEQ sequencers during their samples' sequencing runs. For example, the 'Method' attribute under "Run-level options" specifies a set of parameters for the electrophoresis of one column of samples through the CEQ 8000. These parameters are adjusted to compensate for differences in template type, sequencing reaction efficiency, sequencing product purity, etc. The most commonly adjusted parameters are the Injection Voltage and Duration, Separation Voltage and Duration, Denaturation Temperature, and Capillary Temperature.

These options can in turn be fed into CEQ and ABI sequencers effortlessly through a tab-delimited file, called CEQ or ABI Input file.

Step 2:
After entering the plate into the LIMS system, the user receives a unique Identifier for the plate (e.g.CLS_167F1x53L23_1x1_1) , which she in turn prints onto the physical plate and submits the plate physically for a sequencing run. The AGTC staff download the ABI or CEQ input file for that plate from the LIMS server by using a simple form. The input file can then be imported onto the sequencers using a data collection software and eventually, all the options the user specified while submitting the plate would be used during the sequencing run.
 
Step 3:
After the sequencing run, the data gets exported to the location specified in the ABI input file (e.g. D:\PlatesToFtp\CLS_167F1x53L23_1x1_1). The data is then transferred to the LIMS server by a scheduled Perl script, which simply compresses the data to a  zip file (CLS_167F1x53L23_1x1_1.zip) and ftps it to the LIMS server. The name of the zip file preserves the identity of the plate.

Step 4:
The data is then processed at the LIMS server through analysis scripts. The analysis scripts organize the data into appropriate folders which facilitate assembly, downloading and other analyses, and generate a summary report of the PHRED quality values of the samples. The summary report is automatically emailed to the plate owner and the project PI. A typical summary report is shown below.
 
Control Report:
SeqStandard A01 = OK #Bases_called: 1252#HQBases: 989
CONTROL = OK#Bases_called: 1253 #HQBases: 946
--------------------------------------------------------------------------------------------------------
 
  WHOLE SET STATISTICS (does not include SeqStandard and Control Sequences)
#Bases
(Av)
#HQBases (Av)#HQReads#HQRegions (Av)L.HQRegLen. (Av)#HQRegions (Av)%ecoli#Samples%HQL.HQRegLen. (Av)Seq. Machine
(successful reads)(contiguous)(contiguous)(20bp window)(20bp window)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1208854947547130%6100%834Bogie
 
SEQUENCE-BY-SEQUENCE STATISTICS
Name#Bases#HQBases#HQRegionsL.HQRegLen.#HQRegionsL.HQRegLen.is_ecoliDate
   (contiguous)(20bp window)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MLF_721linear_12a2_1_SeqStandard_A011252989537474977NMar 30, 2005
MLF_721linear_12a2_1_pGEMControl_B011253946546554915NMar 30, 2005
MLF_721linear_12a2_1_a02T7_A021239869102854871NMar 30, 2005
MLF_721linear_12a2_1_a03T7_A0312655701193410175NMar 30, 2005
MLF_721linear_12a2_1_a04T7_A041280945624294951NMar 30, 2005
MLF_721linear_12a2_1_a05T7_A0512738601181724631NMar 30, 2005
MLF_721linear_12a2_1_a06T7_A061259932772312980NMar 30, 2005
MLF_721linear_12a2_1_a07T7_A07932958203761583NMar 30, 2005
 
There are two parameters for these reports.
Quality Threshold: (Def Value. 20) If the phred score of a base exceed this threshold, the base is considered as high quality base.
High Quality Bases threshold (Def value. 100): If the number of high quality bases in a sequence exceed this threshold, then the sequence is considered as high quality sequence.
 
SEQUENCE-BY-SEQUENCE STATISTICS:
Name:: Name of the sequence
#Bases_called: Number of bases in sequence
#HQBases: Number of high quality bases
 
--A High quality region (HQReg) in a sequence is a contiguous set of bases in which all the bases have a phred score greater than the 'Quality Threshold' (Def. Value 20)
 
#HQRegions (contiguous): Number of high quality regions.
Longest HQRegLength(contigous): Length of the longest high quality region
 
--The #HQRegions and LongestHQRegLength (20bp window) values are also obtained by using a 20-base window. A 20 bases window is slided throughout the length of the sequence. If the average value of the phred scores of the 20 bases in the window is above 'Quality Threshold', then these 20 bases are considered as high qualituy bases. Hence, in a HQRegion, the average value of the phred scores of any 20 consecutive bases will be greater than the 'Quality Threshold'. LongestHQRegLength (20bp window) is the length of the longest high quality region.
 
is_ecoli:'Y' if the sequence matches significantly with E.Coli sequence. Otherwise 'N'.
 
WHOLE SET STATISTICS: (Average Values)
 
--These values are the average values of the sequence-by-sequence statistics of the high quality/successful sequences. A sequence is considered as high quality if the number of high quality bases in it exceeds 'High Quality Bases Threshold' (default 100).
 
#HQReads:Number of high quality sequences
%HQ:Percentage of high quality sequences
 
#Bases_called(Av):Sum of the total number of bases called in all the high quality sequences divided by the total number of high quality sequences
#HQBases:Sum of the total number of high quality bases in all the high quality sequences divided by the total number of high quality sequences.
#HQRegions(contiguous and 20bp-window): Sum of the number of high quality regions in all the high quality sequences divided by the total number of the high quality sequences.
LongestHQRegion Length (contiguous and 20bp window):Average value of the LongestHQRegLength of all the high quality sequences.
%E.coli:Percentage of the high quality sequences that matched significantly with E.coli
Seq. Machine:Name of the sequence machine sequenced on.

Finally, PIs and users can access and download the plate data from the "CEQdata index" webpage. This data is organized as PI folders in the top hierarchy, project folders in the next hierarchy and plates data at the end. Each PI folder is password protected to give access to the plates data only to the actual plate owners.

Conclusion:

The CEQ8000 and ABI software are well suited to a custom LIMS project. It allows automated low level control of machine function, automated control of data flow and data types, and excellent customer service and communication. 
 


Last updated: 07 June, 2006
   
 
UK-AGTC
 

Home Button Contact