| |||||||||
|
Annotation Association File Format
Collaborating databases and projects provide the POC project a tab delimited file, known informally as a " association file". This file carries links between database objects and PO terms. The database object may represent one of gene, transcript, protein, protein_structure, complex, germplasm (stock/cultivar), mutant, QTL, etc. Columns in the file are described below. Here is a sample file containing association from Gramene database.File Name
po_aspect_objecttype_organism_organization.assoc
For example:
aspect: growth/anatomy/development.
po_anatomy_gene_arabidopsis_tair.assoc
po_growth_gene_arabidopsis_tair.assoc
po_anatomy_gene_oryza_gramene.assoc
po_growth_gene_oryza_gramene.assoc
objecttype: gene/mutant/germplasm etc.
organism: is always GENUS e.g. arabidopsis/oryza/zea.
organization: the isntitute/project which is contributing the association files. The file name should be in "lowercase" and white spaces replaced by "underscore". Ideally the association files for growth and anatomy should be merged into a single file. However, for the moment we are keeping them separate to make sure things are working fine. As and when we merge the associations, the "aspect" will be removed from the file names.
For example: po_objecttype_organism_organization.assoc
File Format
The flat file format comprises 15 tab-delimited fields (see also a further example containing several annotations). Make sure the column order is strictly followed.
* denotes required fields:
Column Content Example 1. * DB GR 2. * DB_Object_ID GR:0060905 3. * DB_Object_Symbol lrd10 4. Qualifier 5. * PO ID PO:0007014 6. * DB:Reference(|DB:Reference) GR_ref:5655|PMID:2676709 7. * Evidence IMP 8. With (or) From 9. * Aspect G 10. DB_Object_Name lesion resembling disease-10 11. DB_Object_Synonym(|Synonym) spl4|bl5|spotted leaf-4 12.* DB_Object_Type gene 13.* taxon(|taxon) taxon:4527 14.* Date 20050303 15.* Assigned_by GR
Description of the content:
DBThe database contributing the association file.
DB_Object_ID
One of the values in the table of database abbreviations. [Database abbreviations explanation]
This field is mandatory, cardinality 1A unique identifier in DB for the item being annotated.
DB_Object_Symbol
This field is mandatory, cardinality 1.A (unique and valid) symbol to which DB_Object_ID is matched.
Qualifier
Can use ORF name for otherwise unnamed gene or protein.
If gene products are annotated, can use gene product symbol if available, or many gene product annotation entries can share a gene symbol.
This field is mandatory, cardinality 1Flags that modify the interpretation of an annotation.
POid
One (or more) of NOT, contributes_to, colocalizes_with.
This field is not mandatory; cardinality 0, 1, >1; for cardinality >1 use a pipe to separate entries (e.g. NOT|contributes_to).The PO identifier for the term attributed to the DB_Object_ID.
DB:Reference
This field is mandatory, cardinality 1.The unique identifier appropriate to DB for the authority for the attribution of the POid to the DB_Object_ID. This may be a literature reference or a database record. The syntax is DB:accession_number.. Note that only one reference can be cited on a single line. If a reference has identifiers in more than one database, multiple identifiers can be included on a single line. For example, if the reference is a published paper that has a PubMed ID, we strongly recommend that the PubMed ID be included, as well as an identifier within a model organism database.
Evidence
This field is mandatory, cardinality 1, >1; for cardinality >1 use a pipe to separate entries (e.g. GR:8789|PMID:2676709).One of IMP, IGI, IPI, IAGP, ISS, IDA, IEP, IEA, TAS, NAS, ND, IC, RCA.
With (or) From
This field is mandatory, cardinality 1.One of:
Aspect
DB:gene_symbol
DB:gene_symbol[allele_symbol]
DB:gene_id
DB:protein_name
DB:sequence_id
GO:GO_id
This field is not mandatory (except in the case of IC evidence code), cardinality 0, 1, >1 Note: This field is used to hold an additional identifier for annotations using certain evidence codes (IC, IEA, IGI, IPI, ISS). Cardinality = 0 is not recommended, but is permitted because cases can be found in literature where no database identifier can be found (e.g. physical interaction or sequence similarity to a protein, but no ID provided). Annotations where evidence is IGI, IPI, or ISS and 'with' cardinality = 0 should link to an explanation of why there is no entry in 'with.' Cardinality may be >1 for any of the evidence codes that use 'with'; for IPI and IGI cardinality >1 has a special meaning (see evidence documentation for more information). For cardinality >1 use a pipe to separate entries (e.g. TAIR:Atg111111|TAIR:Atg222222). Note that a gene/locus ID may be used in the 'with' column for a IPI annotation, or for an ISS annotation based on amino acid sequence or protein structure similarity, if the database does not have identifiers for individual gene products. 'PO:PO_id' is used only when the evidence code is 'IC', and refers to the PO term(s) used as the basis of a curator inference. In these cases the entry in the 'DB:Reference' column will be that used to assign the PO term(s) from which the inference is made. This field is mandatory for evidence code IC. The ID is usually an identifier for an individual entry in a database (such as a sequence ID, gene ID, PO ID, etc.). Identifiers from the Center for Biological Sequence Analysis (CBS), however, represent tools used to find homology or sequence similarity; these identifiers can be used in the 'with' column for ISS annotations.Either A (plant structure) or G (growth stage and development stage)
DB_Object_Name
This field is mandatory; cardinality 1Name of the object. e.g. gene or gene product
Synonym
This field is not mandatory, cardinality 0, 1 [white space allowed]Any aliases. e.g. Gene_symbol [or other text]
DB_Object_Type
This field is not mandatory, cardinality 0, 1, >1 [white space allowed]What kind of thing is being annotated One of gene, transcript, protein, protein_structure, complex, germplasm (stock/cultivar), mutant, QTL etc. This field is mandatory, cardinality 1
TaxonTaxonomic identifier(s)
Date
For cardinality 1, the ID of the species representing the Object.
For cardinality 2, the first ID is that of the species encoding the gene product; the second ID is that of the species using the gene product.
This field is mandatory, cardinality 1, 2Date on which the annotation was made; format is YYYYMMDD
Assigned_by
This field is mandatory, cardinality 1The database which made the annotation
Note that several fields contain database cross-reference (dbxrefs) in the format dbname:dbaccession. The fields are: POid (where dbname is always PO), DB:Reference, With, Taxon (where dbname is always taxon). For PO ids, do not repeat the 'PO:' prefix (i.e. always use PO:0000000, not PO:PO:00000000)
One of the values in the table of database abbreviations. [Database abbreviations explanation]
Used for tracking the source of an individual annotation.
Default value is value entered in column 1 (DB).
Value will differ from column 1 for any that is made by one database and incorporated into another.
This field is mandatory, cardinality 1
Last modified: Wed Aug 31 13:41:29 2005
| | Contact Us | Copyright Statement | |
| PO Usage Statistics |
| Copyright © 2003 Plant Ontology Consortium |