API Keys

Available API query keys for INDYdb

Here you can a find a list with all 34 available query keys for INDYdb SQL database.

Text keys use regex to find values in the database, including complete or partial words (ex. kinase will also return: kinase-like, kinases and etc...; a search using prot_ID of "1165" will return all genes that contain that combination in their seq ID.)

Numeric keys take mathematical operators of bigger, smaller or equal followed by a value as input. Percentage fields (%) range from 0 to 100 and integer (Int) fields range from 0 to Infinity.

Boolean keys take True of False statements, which work similarly to Text fields but require complete match ("True" or "False") to return the result.

If you are unsure on how to fill each of the key-value pairs, we suggest starting with a simple combination (ex. preferredName [key] and SOD1 [value]) and evaluate the query results to see the best way to frame your specific question.

Text keys:

strain (strain of origin, ex. S288C, SA1, AKU4011 and etc...)
assembly (Accession code for the strain assembly)
level (Level of the assembly: contig, scaffold, chromosome and etc...)
application (Known applications for the strain, ex. bioethanol production, laboratory strain and etc...)
location (Specific location of the collection site as disclosed in the assembly)
macro_region (Macro geological region of the collection site)
source (Publication or reference code for the source that described the genome assembly)
geneID (Internal gene ID for the target gene in INDYdb)
rnaID (Internal rna ID for the target gene in INDYdb)
protID (Internal protein ID for the target gene in INDYdb)
ref_locus (Reference locus in S288C strain)
gene (Reference gene name in S288C strain)
preferredName (Preferred gene name in S. cerevisiae according to STRING database)
annotation (Brief gene description)
interactors (Top 10 known protein-protein interactors of the that gene, only for protein_coding genes)
homologues (Known homologues of that gene in other model species)
gene_coord (Gene coordinates on the target genome)
type (Gene type according to the reference annotation, ex. protein_coding, tRNA, rRNA and etc...)
status (Status for the de novo functional annotation: Verified, Mismatch or Unannotated)
genePred (Official gene name for the gene predicted by the de novo annotation)
geneSeq (Gene sequence for the target gene)
transcriptSeq (Transcript sequence for the target gene)
proteinSeq (Protein sequence for the target gene)

Numerical keys:

id (Percentage of sequence identity from the target that matches the reference gene)
cov_exon (Percentage of coverage from the target exons in comparison to reference)
nVars (Number of SNP sites discovered in the target gene)
copyNumber (Number of copies of the target gene in each strain)
copyID (Unique ID number for each gene copy; 1, 2, 3 and etc...)
nORFs (Number of ORFs found for that gene)

Boolean (True/False) keys:

match_ref_CDS (Is the target gene CDS is a perfect match to the reference strain CDS?)
validORF (Does the gene contain a valid ORF in it's sequence?)
missing_startCodon (Does the ORF has a missing start codon?)
missing_stopCodon (Does the ORF has a missing stop codon?)
outOfFrame_stopCodon (Does the ORF has an out of frame (OOF) stop codon?)