wiki:BatchImportOfBioMaterialItems

Batch Import of BioMaterial Items

Introduction

Batch import allows BioMaterial items like samples, extracts, and labeled extracts to be imported from specification in tab-delimited format in an input text file. Only new BioMaterial items may be imported, while updates of existing items are not supported.

How to Import BioMaterial Data using Batch Import

  1. Upload the input files with tab-delimited data to Proteios.
  2. Check the select boxes next to the input files and click 'Extensions'->'Batch import from tab-delimited input data'.
  3. The contents of the input files will be checked and a page displayed with information on the parsed data. A summary message box at the end reports if errors are found. If this is the case, check error information for the individual files, then go back and correct them, and try again.
  4. If no errors are found, please check that the parsed result is correct. Especially check that BioMaterial type, parent item references, and annotations are the ones you intended.
  5. If all is OK, click tool-bar button 'Create import job[s]' to create import jobs.

Batch Import Input File Specification

The input file may contain data on BioMaterial item attributes, annotations, or parent item specification.

Sample Import Only

First the case for only importing samples is described, i.e. no extracts or labeled extracts are specified:

  1. Empty lines are skipped.
  2. The first non-empty line should be a header line in tab-delimited format with column names.
  3. Column names in a header line are case-sensitive.
  4. Columns will be mapped to BioMaterial variables using the column names in the header line.
  5. A column with name equal to a BioMaterial variable key string is assumed to contain data for this variable.
  6. Only columns for required BioMaterial variables are needed for samples (other variables will be given default values when an item is imported). Currently only "Name" is a required variable.
  7. A column with another name is assumed to contain an annotation with the same annotation type as the column name.
  8. If the specified Annotation type does not exist for a BioMaterial item, it will be created.
  9. Each following non-empty input data line in the input file specifies one BioMaterial item.
  10. The number of columns in an item data line must be at least the number of columns in the header line.
  11. Extra columns in an item data line will be discarded (a warning will be displayed).
  12. It is assumed that the size of the input file is not larger than allowing the parsed input to be displayed in a single table in a web page.

Sample, Extract, and Labeled Extract Import

For extracts and labeled extracts one or two extra columns are needed to specify the parent item. For labeled extracts a column is also needed for the BioMaterial variable "Label".

Specification of Parent Item

  1. Identifier column - A column with unique values for each item, used to identify the item. This may be an extra column added just for this purpose, or an existing column for a BioMaterial variable, provided that it contains unique values for each item. In the latter case the column is used both for BioMaterial variable values and item identification.
  2. Parent specification column - An extra column used to specify a parent item by the identifier value of the latter.
  3. The name of the parent specification column consists of the prefix "Parent" plus the name of the identifier column, e.g. if the identifier column is named "Row", the parent specification column should have name "ParentRow". A column with a name starting with "Parent" indicates that the input file contains parent item data, and defines both the identifier and parent specification column.
  4. Annotations will not be created for the identifier and parent specification columns.
  5. The parent specification value for an item should be equal to the identifier value of the parent item.
  6. A sample is indicated by being its own parent, and identifier and parent specification values should be the same.
  7. The parent item to an extract or labeled extract must be an item defined in the same input file.
  8. The data for a parent item to an extract or labeled extract item must have been defined on a line before the definition of the daughter item.
  9. The user is responsible for ensuring that quantities of extractions would not lead to a parent BioMaterial item getting a negative remaining quantity.

Extra Requirements for Labeled Extracts

  1. Import of labeled extracts requires a column for the BioMaterial variable "Label".
  2. A label is input as the name of a label item in the database. Currently only labels "cy2", "cy3", and "cy5" are available.
  3. A non-blank label value indicates a labeled extract, while a blank value indicates an extract or sample, depending on if a separate parent is specified or not.

Key Strings for BioMaterial Variables to use as Column Names in Input Files

BioMaterial attribute Key string Required
Name Name yes
Description Description no
External ID ExternalId no
Storage Location StorageLocation no
Concentration (g protein/l) ConcentrationInGramsPerLiter no
Original Quantity (µl) OriginalQuantityInMicroLiters no
Label Label no

Examples

Data Presentation

  1. First the contents of the tab-delimited input file is presented as a table for clarity. The input file is a simple text file, where column entries are separated by a <TAB> character. The first non-empty line is the header line with column names. Space characters are allowed in item entries where the corresponding variable allows it, e.g. for "Name" "Description", "StorageLocation", and annotations.
  2. The result of parsing the input file is presented in three tables in a row:
    a. A table "BioMaterial Info" with three columns, "Index", "Parent Index", and "Type", showing the result of the parent specification parsing. The "Index" and "Parent Index" values do not correspond directly to the identifier and parent specification columns in the input file (not even if the latter have the same names), but are internal values resulting from the parent item parsing. The "Index" values are simply item numbers starting with 1, and "Parent Index" values indicates the parent item using the "Index" value of the latter. The "Type" column contains the type resulting from parsing, and is either "Sample", "Extract", or "Labeled extract".
    b. A table "BioMaterial fixed columns" showing values of BioMaterial variables.
    c. A table "Annotations" with values of annotations, where the column names correspond to the annotation types. This table may be empty, if no annotations are specified.

Example 1. Sample Import Only

Input file:

ExternalId Name Description StorageLocation ConcentrationInGramsPerLiter OriginalQuantityInMicroLiters group color
sample1 S1 test drawer 20.0 10 A blue
sample2 S2 test shelf 20.0 20 B red

Parsed result:

BioMaterial info | BioMaterial fixed columns | Annotations
Index Parent Index Type | Name Description External ID Storage Location Concentration (g protein/l) Original Quantity (µl) Label | group color
1 1 Sample | S1 test sample1 drawer 20.0 10.0 | A blue
2 2 Sample | S2 test sample2 shelf 20.0 20.0 | B red

Example 2. Sample, Extract, Labeled Extract Import with Extra Identifier Column

The input file contains extra identifier column "Row" and parent specification column "ParentRow". A "Label" column is also included, in order to specify labeled extracts.

The names and external id values (here identical) in this example indicate the parentage of the item by using the name of the parent item as prefix.
The description string indicates the intended type of the item. A secondary extract or labeled extract is an item where the parent is of the same type as the item, e.g. an extract from another extract.

Input file:

Row ParentRow ExternalId Name Description StorageLocation ConcentrationInGramsPerLiter OriginalQuantityInMicroLiters Label group color
1 1 S1 S1 Sample drawer 20.0 10 A blue
2 2 S2 S2 Sample shelf 20.0 20 B red
3 1 S1E1 S1E1 Extract drawer 20.0 4.0 A blue
4 1 S1E2 S1E2 Extract drawer 20.0 1.0 A blue
5 3 S1E1E1 S1E1E1 Secondary_Extract drawer 20.0 1.5 A blue
6 3 S1E1L1 S1E1L1 Labeled_Extract drawer 20.0 1.5 cy3 A blue
7 5 S1E1E1L1 S1E1E1L1 Labeled_Extract drawer 20.0 0.5 cy2 A blue
8 6 S1E1L1L1 S1E1L1L1 Secondary_Labeled_Extract drawer 20.0 0.5 cy3 A blue

Parsed result:

BioMaterial info | BioMaterial fixed columns | Annotations
Index Parent Index Type | Name Description External ID Storage Location Concentration (g protein/l) Original Quantity (µl) Label | group color
1 1 Sample | S1 Sample S1 drawer 20.0 10.0 | A blue
2 2 Sample | S2 Sample S2 shelf 20.0 20.0 | B red
3 1 Extract | S1E1 Extract S1E1 drawer 20.0 4.0 | A blue
4 1 Extract | S1E2 Extract S1E2 drawer 20.0 1.0 | A blue
5 3 Extract | S1E1E1 Secondary_Extract S1E1E1 drawer 20.0 1.5 | A blue
6 3 Labeled extract | S1E1L1 Labeled_Extract S1E1L1 drawer 20.0 1.5 cy3 | A blue
7 5 Labeled extract | S1E1E1L1 Labeled_Extract S1E1E1L1 drawer 20.0 0.5 cy2 | A blue
8 6 Labeled extract | S1E1L1L1 Secondary_Labeled_Extract S1E1L1L1 drawer 20.0 0.5 cy3 | A blue

Example 3. Sample, Extract, Labeled Extract Import with Internal Identifier Column

Same input data as in Example 2, except that the internal "ExternalId" column is used as identifier column, since its values are unique for each item. The parent specification column is therefore named "ParentExternalId". The parsed result is identical to that in Example 2.

Input file:

ParentExternalId ExternalId Name Description StorageLocation ConcentrationInGramsPerLiter OriginalQuantityInMicroLiters Label group color
S1 S1 S1 Sample drawer 20.0 10 A blue
S2 S2 S2 Sample shelf 20.0 20 B red
S1 S1E1 S1E1 Extract drawer 20.0 4.0 A blue
S1 S1E2 S1E2 Extract drawer 20.0 1.0 A blue
S1E1 S1E1E1 S1E1E1 Secondary_Extract drawer 20.0 1.5 A blue
S1E1 S1E1L1 S1E1L1 Labeled_Extract drawer 20.0 1.5 cy3 A blue
S1E1E1 S1E1E1L1 S1E1E1L1 Labeled_Extract drawer 20.0 0.5 cy2 A blue
S1E1L1 S1E1L1L1 S1E1L1L1 Secondary_Labeled_Extract drawer 20.0 0.5 cy3 A blue

Parsed result:

BioMaterial info | BioMaterial fixed columns | Annotations
Index Parent Index Type | Name Description External ID Storage Location Concentration (g protein/l) Original Quantity (µl) Label | group color
1 1 Sample | S1 Sample S1 drawer 20.0 10.0 | A blue
2 2 Sample | S2 Sample S2 shelf 20.0 20.0 | B red
3 1 Extract | S1E1 Extract S1E1 drawer 20.0 4.0 | A blue
4 1 Extract | S1E2 Extract S1E2 drawer 20.0 1.0 | A blue
5 3 Extract | S1E1E1 Secondary_Extract S1E1E1 drawer 20.0 1.5 | A blue
6 3 Labeled extract | S1E1L1 Labeled_Extract S1E1L1 drawer 20.0 1.5 cy3 | A blue
7 5 Labeled extract | S1E1E1L1 Labeled_Extract S1E1E1L1 drawer 20.0 0.5 cy2 | A blue
8 6 Labeled extract | S1E1L1L1 Secondary_Labeled_Extract S1E1L1L1 drawer 20.0 0.5 cy3 | A blue
Last modified 13 years ago Last modified on Oct 1, 2009, 9:17:57 AM

Attachments (3)

Download all attachments as: .zip