Introduction
The University of Kentucky Advanced Genetics Technologies Center (UK-AGTC) is a core-type facility which employs robotics and high-throughput tools in a mission of providing the most cost-effective facilitation of cutting-edge research. Laboratory database management suites enhance the ability to customize these processes and to store, organize, and then disseminate the large volume of data (460,800 bases of DNA per day)
to their owners via web-based applications. Examples of Current UK-AGTC sequencing projects include:
- Genomes of forage and turf grass symbionts
- Genomes of insect viruses
- Gene expression profiles in horses
- Genomes of plant-pathogenic fungi
- Gene expression during plant development and reproduction
To meet the wide variety of UK faculty's DNA sequencing demands, AGTC has purchased six Beckman Coulter CEQ8000 Capillary Electrophoresis DNA sequencers. The CEQ's eight-capillary array electrophoreses each column of a 96-well plate independently, allowing for both a variety of run conditions on one plate and relatively swift throughput. The staff of AGTC have designed and implemented a custom LIMS for sequencing project sample and data handling. Samples and data are tracked through the analytical process by modeling the way the sample is handled in the lab from preparation to final results.
Every sample sequenced at AGTC is given a name that uniquely identifies the sample. These identifiers are according to a naming convention designed by AGTC-staff. To facilitate easy storage and retrieval of data, all the samples are categoried with a name of the project and the lab. Each lab is identified by the initials of the PI. For example, CLS is a unique identifier for Dr.Schardl's lab. Projects are further classified as Standard and Non-Standard projects. Projects with a large number of samples to be assembled are categorized as Standard projects. Parameters such as the primers can be defined for standard projects. Whole genome, EST, BAC are some examples of standard projects.
Naming convention for the samples in Standard projects is shown in the figure below.
For example, if a chromatogram file has a name CLS_Cp26A07_3a5_1_a04f_A02.ab1, following are the details of the sample:
-
Lab: CLS (Dr.Schardl's lab)
- Project: Cp26A07
- Reaction plate/ Sequenceing Plate: CLS_Cp26A07_3a5, Well: A02
- DNA plate/Library plate: CLS_Cp26A07_3, Well: a04
- Primer used for sequencing: f
Projects with a fewer number of samples where you can define details such as the dna plate, reaction plate, source etc are categorized as Non-Standard projects. Each sample in a Non-Standard project has four attributes: Source, Template, Primer and User Discretionary attributes.
- Source: From where the sequencing template is derived. Eg: Source of DNA such as cosmid used as template for PCR or was ligated to vector to get a clone.
- Template: template used for sequencing (Eg: Plasmid, PCR product)
- Primer: Primer used in sequencing
- User Discretionary Attributes: Any other details that the user likes to include for this sample
You can assemble a subset of sequences on the basis of these attributes. Naming convention of the samples is shown in figure below.
If a chromatogram file is named as CLS_167F1x53L23_1x1_1_167f_p53L23CE_M13R_x_JB2_B147.ab1, following are the details of the samples:
- Lab: CLS (Dr.Schardl's Lab)
- Project: 167F1x53L23
- Reaction Plate/Sequencing plate: CLS_167F1x53L23_1x1_1, Well/tube number: JB2
- DNA/Library Plate: CLS_167F1x53L23_1
- Source: 167f
- Template: p53L23CE
- Primer: M13R
- User Discretionary Attribute: x
- Community Plate name: B147
In this poster, we focus on the use of novel aspects in the operating
software of Applied Biosystems (ABI) that facilitates its integration into such
a custom LIMS.
Step 1:
Before a user physically submits a 96-well sample plate to the AGTC
facility, he registers the plate(online) using the Plate Submission form. Users can create an Excel Template file with the help of which they can easily fill in the online plate submission form. Below is an example template file (for a non-standard project) and a description of each of the fields.
| PI | Project | Runsetname | RerunID | Source | Template | Primer | User Disc | Wellno./
Tubeno. |
| SR | CD68 | 1x1 | 1 | pGEMControl | | | | B01 |
| SR | CD68 | 1x1 | 1 | wt | 3 | IVSf | x | JMC1 |
| SR | CD68 | 1x1 | 1 | wt | 8 | IVSf | x | JMC2 |
| SR | CD68 | 1x1 | 1 | d1x49 | 10 | IVSf | x | JMC3 |
| SR | CD68 | 1x1 | 1 | d1x49 | 30 | IVSf | x | JMC4 |
| SR | CD68 | 1x1 | 1 | wt | 3 | MiCF | x | JMC5 |
| SR | CD68 | 1x1 | 1 | wt | 8 | MiCF | x | JMC6 |
| SR | CD68 | 1x1 | 1 | d1x49 | 10 | Micd149f | x | JMC7 |
| SR | CD68 | 1x1 | 1 | d1x49 | 30 | Micd149f | x | JMC8 |
PI: PI of your lab
Project: Whatever you want to call the project. You will be able to call up sequences by project on the database.
1x1: An identifier needed for the database. The x is our code indicating that applied biosystems sequencing chemistry, and that the sequencing is done bt the client (i.e. external to AGTC). If you submit sequences you have done with the Bechman-Coulter DTCS Kit this field should have the form 1y1, where the y indicates client prepared DTCS-Chemistry. Sequences prepared by AGTC staff will have different
letters to indicate Applied Biosystems (a) or Beckman-Coulter (b) chemistry.
1: This is always 1.
Source, Template, Primer, UserDisc: These are the attributes defined for the sample as described above.
Well Number/ Tube Number: You will have to submit your samples to AGTC in a 96 well plate. This plate is more stable than individual tubes and provides adequate space for you to write the essential information such as the PI's initials, project name and the date. In this field you will provide the number of the well containing the sample. (A01, B01 etc) in your submitted plate. However, if you submit the samples in tubes, you provide the name or number that you printed on the tube.
Each sample is thus identified by all the above fields. For example, the chromatogram file corresponding to the sample in tube number JMC1 of the above
template will be named as SR_CD68_1x1_1_wt_3_IVSf_x_JMC1_B122.ab1
On registering the plate through the online submission form, the user receives an identifier that is according to the naming convention described above.
This identifier is used to keep track of the plate
throughout the pipeline. The Plate
Submission Form also provides the users the ability to choose different options that could be set on ABI and CEQ sequencers during their samples'
sequencing runs. For example, the 'Method' attribute under "Run-level options" specifies a set of parameters for the electrophoresis of one column of samples through the
CEQ 8000. These parameters are adjusted to compensate for differences in template type, sequencing reaction efficiency, sequencing product
purity, etc. The most commonly adjusted parameters are the Injection Voltage and Duration, Separation Voltage and Duration, Denaturation
Temperature, and Capillary Temperature.
These options can in turn be fed into CEQ and ABI sequencers effortlessly
through a tab-delimited file, called CEQ or ABI Input file.
After entering the plate into the LIMS system, the user receives a
unique Identifier for the plate (e.g.CLS_167F1x53L23_1x1_1) , which she
in turn prints onto the physical plate and submits the plate physically
for a sequencing run. The AGTC staff download the ABI or CEQ input file for
that plate from the LIMS server by using a simple form. The input
file can then be imported onto the sequencers using a data collection software
and eventually, all the options the user specified while
submitting the plate would be used during the sequencing run.
After the sequencing run, the data gets exported to the location
specified in the ABI input file (e.g. D:\PlatesToFtp\CLS_167F1x53L23_1x1_1).
The data is then transferred to the LIMS server by a scheduled Perl
script, which simply compresses the data to a
zip file (CLS_167F1x53L23_1x1_1.zip) and ftps it to the LIMS
server. The name of the zip file preserves the identity of the
plate.
Step 4:
The data is then processed at the LIMS
server through analysis scripts. The analysis scripts organize the data into appropriate folders
which facilitate assembly, downloading and other analyses, and generate
a summary report of the PHRED quality values of
the samples. The summary report is automatically emailed to the plate
owner and the project PI. A typical summary report is shown below.
Control Report:
| SeqStandard A01 = OK | #Bases_called: 1252 | #HQBases: 989 |
| CONTROL = OK | #Bases_called: 1253 | #HQBases: 946 |
| -------------------------------------------------------------------------------------------------------- |
WHOLE SET STATISTICS (does not include SeqStandard and Control Sequences)
#Bases (Av) | #HQBases (Av) | #HQReads | #HQRegions (Av) | L.HQRegLen. (Av) | #HQRegions (Av) | %ecoli | #Samples | %HQ | L.HQRegLen. (Av) | Seq. Machine |
| (successful reads) | | | (contiguous) | (contiguous) | (20bp window) | | | | (20bp window) | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1208 | 854 | 94 | 75 | 471 | 3 | 0% | 6 | 100% | 834 | Bogie |
SEQUENCE-BY-SEQUENCE STATISTICS
| Name | #Bases | #HQBases | #HQRegions | L.HQRegLen. | #HQRegions | L.HQRegLen. | is_ecoli | Date |
| | | | (contiguous) | (20bp window) |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| MLF_721linear_12a2_1_SeqStandard_A01 | 1252 | 989 | 53 | 747 | 4 | 977 | N | Mar 30, 2005 |
| MLF_721linear_12a2_1_pGEMControl_B01 | 1253 | 946 | 54 | 655 | 4 | 915 | N | Mar 30, 2005 |
| MLF_721linear_12a2_1_a02T7_A02 | 1239 | 869 | 102 | 85 | 4 | 871 | N | Mar 30, 2005 |
| MLF_721linear_12a2_1_a03T7_A03 | 1265 | 570 | 119 | 34 | 10 | 175 | N | Mar 30, 2005 |
| MLF_721linear_12a2_1_a04T7_A04 | 1280 | 945 | 62 | 429 | 4 | 951 | N | Mar 30, 2005 |
| MLF_721linear_12a2_1_a05T7_A05 | 1273 | 860 | 118 | 172 | 4 | 631 | N | Mar 30, 2005 |
| MLF_721linear_12a2_1_a06T7_A06 | 1259 | 932 | 77 | 231 | 2 | 980 | N | Mar 30, 2005 |
| MLF_721linear_12a2_1_a07T7_A07 | 932 | 958 | 20 | 376 | 1 | 583 | N | Mar 30, 2005 |
There are two parameters for these reports.
Quality Threshold: (Def Value. 20) If the phred score of
a base exceed this threshold, the base is considered as high quality
base.
High Quality Bases threshold (Def value. 100): If the
number of high quality bases in a sequence exceed this threshold, then
the sequence is considered as high quality sequence.
SEQUENCE-BY-SEQUENCE STATISTICS:
Name:: Name of the sequence
#Bases_called: Number of bases in
sequence
#HQBases: Number of high
quality bases
--A High quality region
(HQReg) in a sequence is a contiguous set of bases in which all the
bases have a phred score greater than the 'Quality Threshold' (Def. Value 20)
#HQRegions (contiguous):
Number of high quality regions.
Longest HQRegLength(contigous): Length of the
longest high quality region
--The #HQRegions and LongestHQRegLength (20bp window)
values are also obtained by using a 20-base window. A 20 bases window is slided throughout the length of the sequence. If the average value of the phred scores of the 20 bases in the window is above 'Quality Threshold', then these 20 bases are considered as high qualituy bases. Hence, in a HQRegion, the average value of the phred scores of any 20 consecutive bases will be greater than the 'Quality Threshold'. LongestHQRegLength (20bp window) is the length of the longest high quality region.
is_ecoli:'Y' if the sequence matches significantly with E.Coli sequence. Otherwise 'N'.
WHOLE SET STATISTICS:
(Average Values)
--These values are the average values of the sequence-by-sequence statistics of the high quality/successful sequences. A sequence is considered as high quality if the number of high quality bases in it exceeds 'High Quality Bases Threshold' (default 100).
#HQReads:Number of high quality sequences
%HQ:Percentage of high quality sequences
#Bases_called(Av):Sum of the total number of bases called in all the high quality sequences divided by the total number of high quality sequences
#HQBases:Sum of the total number of high quality bases in all the high quality sequences divided by the total number of high quality sequences.
#HQRegions(contiguous and 20bp-window): Sum of the number of high quality regions in all the high quality sequences divided by the total number of the high quality sequences.
LongestHQRegion Length (contiguous and 20bp window):Average value of the LongestHQRegLength of all the high quality sequences.
%E.coli:Percentage of the high quality sequences that matched significantly with E.coli
Seq. Machine:Name of the sequence machine sequenced on.
Finally, PIs and users can access and download the plate data from the
"CEQdata index" webpage. This data is organized as PI folders in the top
hierarchy, project folders in the next
hierarchy and plates data at the end. Each PI folder is password protected
to give access to the plates data only to the actual plate owners.
Conclusion:
The CEQ8000 and ABI software are well suited to a custom LIMS project. It
allows automated low level control of machine function, automated
control of data flow and data types, and excellent customer service and
communication.
Last updated: 07 June, 2006
|