Seq2Fun Databases

  • For most non-model organisms, biological understanding of study outcomes is limited to protein-coding genes with functional annotations such as KEGG pathways, Gene Ontology or PANTHER classification system. Therefore, developing Seq2Fun database to focus on functionally annotated genes such as, protein-coding genes, GOs and KOs largely meets the preferred needs of most scientists studying non-model organisms.

    We provide dozens (~30) of pre-built databases that can be downloaded here.
    Note: the ortholog includes all genes (including genes are not orthology with any other genes) from that groups of organisms.
    Groups of organisms can be download from here.

    The definition of a core ortholog is that its frequency in the group >= 0.90. This is in consistent with BUSCO
    Note: * frequency >= 0.85; ** frequency >= 0.80;

    Group Species Proteins Ortholog Core ortholog Filename Date
    Algae 14 155495 38334 1521 algae.tar.gz 07-11-2022
    alveolates 21 207674 51205 1132 alveolates.tar.gz 07-11-2022
    amoebozoa 7 81844 22114 1165 amoebozoa.tar.gz 07-11-2022
    amphibians 3 75261 17186 13925 amphibians.tar.gz 07-11-2022
    animals 370 7150735 270089 1512 animals.tar.gz 07-11-2022
    apicomplexans 18 93576 14632 1091 apicomplexans.tar.gz 07-11-2022
    arthropods 119 1727651 113673 5106 arthropods.tar.gz 07-11-2022
    ascomycetes 100 904642 98151 2799 ascomycetes.tar.gz 07-11-2022
    basidiomycetes 33 363997 56935 2453 basidiomycetes.tar.gz 07-11-2022
    birds 31 482205 22397 11761 birds.tar.gz 07-11-2022
    cnidarians 9 203000 24003 5547 cnidarians.tar.gz 07-11-2022
    crustaceans 7 154960 37407 5216 crustaceans.tar.gz 07-11-2022
    dothideomycetes 10 123200 28898 5567 dothideomycetes.tar.gz 07-11-2022
    eudicots 93 3180221 102679 8230 eudicots.tar.gz 07-11-2022
    euglenozoa 9 86483 12363 4638 euglenozoa.tar.gz 07-11-2022
    eurotiomycetes 20 196228 25710 4723 eurotiomycetes.tar.gz 07-11-2022
    fishes 64 1736572 43690 13248 fishes.tar.gz 07-11-2022
    flatworms 4 58181 17784 4237 flatworms.tar.gz 07-11-2022
    fungi 138 1278312 148080 2138 fungi.tar.gz 07-11-2022
    insects 101 1376824 70170 5971 insects.tar.gz 07-11-2022
    leotiomycetes 5 67865 21669 5707 leotiomycetes.tar.gz 07-11-2022
    mammals 94 1910363 47144 14776 mammals.tar.gz 07-11-2022
    mollusks 9 206905 35775 6726 mollusks.tar.gz 07-11-2022
    monocots 17 560027 43452 9611 monocots.tar.gz 07-11-2022
    nematodes 6 134093 35865 3280 nematodes.tar.gz 07-11-2022
    plants 127 3968027 162990 3485 plants.tar.gz 07-11-2022
    protists 52 660237 134452 602 protists.tar.gz 07-11-2022
    reptiles 20 384584 21725 12715 reptiles.tar.gz 07-11-2022
    saccharomycetes 36 195913 14873 3079 saccharomycetes.tar.gz 07-11-2022
    stramenopiles 8 119746 31582 567 stramenopiles.tar.gz 07-11-2022
    vertebrates 212 4588985 83704 8222 vertebrates.tar.gz 07-11-2022

    If you want to download the databases of Seq2Fun version 1, please click here.
  • We fully support customer built database. See MANUAL 13. Custom built database.

  • The following 8 databases are used for the assessment of Seq2Fun version 1 with mouse, chicken, zebrafish and roundworm datasets.
    The RNA-seq data can be download from here.

    Group Proteins KOs Species Filename
    Mammals_no_mouse 356,672 5,622 64 mammals_no_mouse.tar.gz
    Mouse 8,438 5,437 1 mouse.tar.gz
    Birds_no_chicken 81,576 4,176 23 birds_no_chicken.tar.gz
    Chicken 4,930 3,921 1 chicken.tar.gz
    Fishes_no_zebrafish 267,954 4,235 38 fishes_no_zebrafish.tar.gz
    Zebrafish 6,047 3,963 1 zebrafish.tar.gz
    Nematodes_no_roundworm 13,939 2,950 5 nematodes_no_worm.tar.gz
    Roundworm 3,081 2,391 1 worm.tar.gz