!gpi-version: 1.2 ! ! This file contains information about the proteome of SARS-CoV-2, in GPI format ! ! The file is based on information provided by UniProt, but includes additional curated information ! ! The following sources were used: ! ! - UniProtKB GPI file (ftp://ftp.ebi.ac.uk/pub/contrib/goa/uniprot_sars-cov-2.gpi) ! - SciBite CORD-19 Vocabulary (https://github.com/SciBiteLabs/CORD19) ! - The Protein Ontology (https://proconsortium.org/download/development/pro_sars2.gpi) ! - Additional knowledge about protein nomenclature acquired as part of National Lab funded SARS-CoV-2 efforts (NVBL and LDRD) ! ! Some key properties: ! - we include an entry for each functional protein, including non-structural proteins cleaved from polypeptides ! - we use chain IDs of the form UniProtKB:P0DTC1-PRO_0000449645 for NSPs ! - we do *not* include all chain IDs; in particular: ! - we only use when necessary, for NSPs ! - NSP1-10 have duplicate entries in UniProt, there are two polyprotein entries with identical sequence prior to frameshift ! - we use the longer pp parent as the canonical/reference entry. This decision is synced with what IntAct uses ! ! It is intended to be used for different purpose: ! ! - The canonical set of annotatable function-capable entities used by the Gene Ontology project ! - Use in Knowledge Graphs, such as https://github.com/Knowledge-Graph-Hub/kg-covid-19/ ! - Vocabulary for Natural Language Processing / Concept Recognition ! ! The file is maintained in GitHub in this repo: https://github.com/Knowledge-Graph-Hub/kg-covid-19/ ! ! More information on the original provenance of this file can be found at: https://github.com/geneontology/go-site/issues/1431 ! !Columns: ! ! name required? cardinality GAF column # Example content ! DB required 1 1 UniProtKB ! DB_Object_ID required 1 2/17 P0DTC1, P0DTC1-PRO_0000449645 ! DB_Object_Symbol required 1 3 nsp11 ! DB_Object_Name optional 0 or greater 10 Non-structural protein 11 ! DB_Object_Synonym(s) optional 0 or greater 11 PL-PRO ! DB_Object_Type required 1 12 protein ! Taxon required 1 13 taxon:2697049 ! Parent_Object_ID optional 0 or 1 - UniProtKB:Q4VCS5 ! DB_Xref(s) optional 0 or greater - PR:000050270 ! Properties optional 0 or greater - not used yet ! ! UniProtKB P0DTC1 pp1a Replicase polyprotein 1a ORF1a|1a|ORF1a/ClvPrd (SARS2)|ORF1a proteolytic cleavage product protein taxon:2697049 PR:P0DTC1-1 UniProtKB P0DTC1-PRO_0000449645 nsp11 Non-structural protein 11 P0DTC1(4393-4405)|rep/Clv:nsp11 (SARS2)|UniProtKB:P0DTC1, 4393-4405|nsp11 (SARS2)|Non-structural protein 11|nsp-11|ns11|ns-11|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 11|4393-4405 protein taxon:2697049 UniProtKB:P0DTC1 PR:000050280|PRO_0000449645 UniProtKB P0DTC2 S protein Spike glycoprotein S|S glycoprotein (SARS2)|surface glycoprotein (SARS2)|E2 (SARS2)|S (SARS2)|peplomer protein (SARS2)|Spike|spike gene|Spike (S) glycoprotein|Spike (S)|S glycoprotein|spike surface glycoprotein (S)|E2|E2 protein|E2|S-glycoprotein protein taxon:2697049 PR:P0DTC2 UniProtKB P0DTC2-PRO_0000449647 Spike protein S1 Spike protein S1 P0DTC2(13-685)|UniProtKB:P0DTC2, 13-685|S1 (SARS2)|Spike (S1) glycoprotein|S1 protein|spike (S1) protein|S1 protein taxon:2697049 UniProtKB:P0DTC2 PR:000050267|PRO_0000449647 UniProtKB P0DTC2-PRO_0000449648 Spike protein S2 Spike protein S2 P0DTC2(686-1273)|UniProtKB:P0DTC2, 686-1273|S2 (SARS2)|Spike (S2) glycoprotein|S2 protein|spike (S2) protein|S2 protein taxon:2697049 UniProtKB:P0DTC2 PR:000050268|PRO_0000449648 UniProtKB P0DTC2-PRO_0000449649 Spike protein S2' Spike protein S2' P0DTC2(816-1273)|UniProtKB:P0DTC2, 816-1273|S2' (SARS2)|Spike (S2') glycoprotein|S2' protein|spike (S2') protein|S2' protein taxon:2697049 UniProtKB:P0DTC2 PR:000050269|PRO_0000449649 UniProtKB P0DTC3 ORF3a ORF3a protein 3a|protein U274 (SARS2)|ORF3a (SARS2)|protein X1 (SARS2)|accessory protein 3a (SARS2)|P0DTC3|Protein 3a|Accessory protein 3a|Protein U274|Protein X1|3a|3a protein|3a accessory gene|3a accessory protein protein taxon:2697049 PR:P0DTC3|PRO_0000449650 UniProtKB P0DTC4 E protein Envelope small membrane protein sM protein (SARS2)|envelope protein (SARS2)|E protein (SARS2)|E (SARS2)|Envelope small membrane protein|small envelope protein (E)|small envelope protein|E protein taxon:2697049 PR:P0DTC4|PRO_0000449651 UniProtKB P0DTC5 M protein Membrane protein E1 glycoprotein|Matrix glycoprotein|Matrix protein (M)|Membrane glycoprotein|Matrix protein|Matrix (M) protein protein taxon:2697049 PR:P0DTC5 UniProtKB P0DTC6 ORF6 ORF6 ns6|X3|accessory protein 6 (SARS2)|ns6 (SARS2)|protein X3 (SARS2)|ORF6 (SARS2)|Non-structural protein 6|ns-6|Protein X3|6 accessory gene|6 accessory protein|6 protein taxon:2697049 PR:P0DTC6|PRO_0000449653 UniProtKB P0DTC7 ORF7a ORF7a protein 7a|7a/SigPep- (SARS2)|protein 7a, signal peptide removed form|UniProtKB:P0DTC7, 16-121|protein X4 (SARS2)|accessory protein 7a (SARS2)|protein U122 (SARS2)|ORF7a (SARS2)|7a|7a protein|protein 7a|7a accessory gene|7a accessory protein|P0DTC7 protein taxon:2697049 PR:P0DTC7|PRO_0000449654 UniProtKB P0DTC8 ORF8 ORF8 UniProtKB:P0DTC8, 16-121|ORF8/SigPep- (SARS2)|ORF8 protein, signal peptide removed form|accessory protein 8|ORF8 (SARS2)|ns8 (SARS2)|Non-structural protein 8|Protein non-structural 8|ns8|ns-8|nsp 8|nsp8|P0DTC8|ORF8 protein|P0DTC8 protein taxon:2697049 PR:P0DTC8|PRO_0000449655 UniProtKB P0DTC9 N protein Nucleoprotein NC|Protein N|P0DTC9(1-419)|N|NC (SARS2)|N (SARS2)|nucleocapsid protein (SARS2)|protein N (SARS2)|Nucleoprotein (N)|Nucleocapsid protein|nucleocapsid gene|nucleocapsid protein (N) protein taxon:2697049 PR:P0DTC9|PRO_0000449656 UniProtKB P0DTD1 pp1ab Replicase polyprotein 1ab rep|1a-1b|ORF1ab|ORF1ab/ClvPrd (SARS2)|ORF1ab proteolytic cleavage product|Replicase polyprotein 1ab protein taxon:2697049 PR:P0DTD1-1 UniProtKB P0DTD1-PRO_0000449619 nsp1 Host translation inhibitor nsp1|P0DTD1(1-180)|rep/Clv:nsp1 (SARS2)|PRO_0000449619|nsp1 (SARS2)|UniProtKB:P0DTD1, 1-180|leader protein (SARS2)|UniProtKB:P0DTC1, 1-180|non-structural protein 1 (SARS2)|nsp-1|ns1|ns-1|host translation inhibitor nsp1|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 1 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050270|UniProtKB:P0DTD1-PRO_0000449635|PRO_0000449635 UniProtKB P0DTD1-PRO_0000449620 nsp2 Non-structural protein 2 P0DTD1(181-818)|UniProtKB:P0DTC1, 181-818|UniProtKB:P0DTD1, 181-818|p65 homolog (SARS2)|PRO_0000449636|nsp2 (SARS2)|rep/Clv:nsp2 (SARS2)|nsp-2|ns2|ns-2|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 2 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050271|UniProtKB:P0DTD1-PRO_0000449636|PRO_0000449620 UniProtKB P0DTD1-PRO_0000449621 nsp3 Non-structural protein 3 PL-PRO|P0DTD1(819-2763)|UniProtKB:P0DTD1, 819-2763|PLpro (SARS2)|nsp3 (SARS2)|rep/Clv:nsp3 (SARS2)|main proteinase (SARS2)|papain-like proteinase (SARS2)|UniProtKB:P0DTC1, 819-2763|PRO_0000449637|nsp-3|ns3|ns-3|Papain-like proteinase|Papain-like protease|Papain-like protease 2|Papain-like proteinase 2|PL-PRO|PL2-PRO|PL2pro|PLpro|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 3|819-2763|SARS-CoV-2-PLpro|2764-3263|ADRP protein taxon:2697049 UniProtKB:P0DTD1 PR:000050272|UniProtKB:P0DTD1-PRO_0000449637|PRO_0000449621 UniProtKB P0DTD1-PRO_0000449622 nsp4 Non-structural protein 4 P0DTD1(2764-3263)|RO_0000449638|rep/Clv:nsp4 (SARS2)|UniProtKB:P0DTC1, 2764-3263|nsp4 (SARS2)|UniProtKB:P0DTD1, 2764-3263|nsp-4|ns4|ns-4|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 4 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050273|UniProtKB:P0DTD1-PRO_0000449638|PRO_0000449622 UniProtKB P0DTD1-PRO_0000449623 nsp5 3C-like proteinase 3CL-PRO|3CLp|Mpro|P0DTD1(3264-3569)|UniProtKB:P0DTD1, 3264-3569|3CLpro (SARS2)|UniProtKB:P0DTC1, 3264-3569|nsp5 (SARS2)|3CL-PRO (SARS2)|picornavirus 3C-like protease (SARS2)|rep/Clv:nsp5 (SARS2)|3CLp (SARS2)|3C-like protease|3CL pro proteinase|3CL-PRO|3CLp|3CLpro|SARS-3CL pro|SARS-3CL pro proteinase|SARS-3CLpro|proteinase 3CL pro|3-chymotrypsin-like protease (3CLpro)|3-chymotrypsin-like protease|chymotrypsin-like protease|chymotrypsin-like cysteine protease|Non-structural protein 5|nsp-5|ns5|ns-5|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 5|M pro|main protease|main proteinase|Mpro|SARS coronavirus main proteinase|SARS-CoV-3CLpro|SARS-CoV-2-Mpro|SARS-CoV-3CLp protein taxon:2697049 UniProtKB:P0DTD1 PR:000050274|UniProtKB:P0DTD1-PRO_0000449639|PRO_0000449639|PRO_0000449623 UniProtKB P0DTD1-PRO_0000449624 nsp6 Non-structural protein 6 P0DTD1(3570-3859)|nsp6 (SARS2)|UniProtKB:P0DTC1, 3570-3859|UniProtKB:P0DTD1, 3570-3859|rep/Clv:nsp6 (SARS2)|nsp-6|ns6|ns-6|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 6 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050275|UniProtKB:P0DTD1-PRO_0000449640|PRO_0000449640|PRO_0000449624 UniProtKB P0DTD1-PRO_0000449625 nsp7 Non-structural protein 7 P0DTD1(3860-3942)|rep/Clv:nsp7 (SARS2)|nsp7 (SARS2)|UniProtKB:P0DTD1, 3860-3942|UniProtKB:P0DTC1, 3860-3942|nsp-7|ns7|ns-7|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 7 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050276|UniProtKB:P0DTD1-PRO_0000449641|PRO_0000449625|PRO_0000449641 UniProtKB P0DTD1-PRO_0000449626 nsp8 Non-structural protein 8 P0DTD1(3943-4140)|nsp8 (SARS2)|rep/Clv:nsp8 (SARS2)|UniProtKB:P0DTD1, 3943-4140|UniProtKB:P0DTC1, 3943-4140|nsp-8|ns8|ns-8|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 8|ORF8 protein|Protein non-structural 8|ORF8/SigPep- (SARS2)|UniProtKB:P0DTC8|accessory protein 8|ORF8 (SARS2)|ORF8 protein|ns8 (SARS2)|P0DTC8|nsp 8|ORF8|PRO_0000449655 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050277|UniProtKB:P0DTD1-PRO_0000449642|PRO_0000449655|PRO_0000449642|PRO_0000449626 UniProtKB P0DTD1-PRO_0000449627 nsp9 Non-structural protein 9 P0DTD1(4141-4253)|UniProtKB:P0DTC1, 4141-4253|nsp9 (SARS2)|rep/Clv:nsp9 (SARS2)|UniProtKB:P0DTD1, 4141-4253|nsp-9|ns9|ns-9|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 9 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050278|UniProtKB:P0DTD1-PRO_0000449643|PRO_0000449627|PRO_0000449643 UniProtKB P0DTD1-PRO_0000449628 nsp10 Non-structural protein 10 GFL|P0DTD1(4254-4392)|rep/Clv:nsp10 (SARS2)|UniProtKB:P0DTD1, 4254-4392|UniProtKB:P0DTC1, 4254-4392|growth factor-like peptide (SARS2)|nsp10 (SARS2)|GFL (SARS2)|nsp-10|ns10|ns-10|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 10 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050279|UniProtKB:P0DTD1-PRO_0000449644|PRO_0000449628|PRO_0000449644 UniProtKB P0DTD1-PRO_0000449629 nsp12 RNA-directed RNA polymerase Pol|RdRp|P0DTD1(4393-5324)|Pol (SARS2)|RdRp (SARS2)|nsp12 (SARS2)|UniProtKB:P0DTD1, 4393-5324|ORF1ab/Clv:nsp12 (SARS2)|Non-structural protein 12|nsp-12|ns12|ns-12|RNA-dependent RNA polymerase|RdRp|Pol|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 12|holo-RdRp protein taxon:2697049 UniProtKB:P0DTD1 PR:000050284|PRO_0000449629 UniProtKB P0DTD1-PRO_0000449630 nsp13 Helicase Hel|P0DTD1(5325-5925)|UniProtKB:P0DTD1, 5325-5925|ORF1ab/Clv:nsp13 (SARS2)|nsp13 (SARS2)|Hel (SARS2)|Non-structural protein 13|nsp-13|ns13|ns-13|helicase|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 13|NTPase/helicase protein taxon:2697049 UniProtKB:P0DTD1 PR:000050285|PRO_0000449630 UniProtKB P0DTD1-PRO_0000449631 nsp14 Proofreading exoribonuclease ExoN|P0DTD1(5926-6452)|UniProtKB:P0DTD1, 5926-6452|ORF1ab/Clv:nsp14 (SARS2)|nsp14 (SARS2)|ExoN (SARS2)|bifunctional exonuclease/methyltransferase|Non-structural protein 14|nsp-14|ns14|ns-14|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 14 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050286|PRO_0000449631 UniProtKB P0DTD1-PRO_0000449632 nsp15 Uridylate-specific endoribonuclease P0DTD1(6453-6798)|nsp15 (SARS2)|NendoU (SARS2)|ORF1ab/Clv:nsp15 (SARS2)|UniProtKB:P0DTD1, 6453-6798|endoRNAse (SARS2)|Non-structural protein 15|nsp-15|ns15|ns-15|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 15 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050287|PRO_0000449632 UniProtKB P0DTD1-PRO_0000449633 nsp16 2'-O-methyltransferase P0DTD1(6799-7096)|nsp16 (SARS2)|2'-O-ribose methyltransferase (SARS2)|UniProtKB:P0DTD1, 6799-7096|ORF1ab/Clv:nsp16 (SARS2)|Non-structural protein 16|nsp-16|ns16|ns-16|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 16 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050288|PRO_0000449633 UniProtKB P0DTD2 ORF9b accessory protein 9b ORF-9b|accessory protein 9b (SARS2)|ORF-9b (SARS2)|ORF9b (SARS2)|Protein 9b|Accessory protein 9b|ORF-9b|9b protein|9b accessory gene|9b accessory protein|9b protein taxon:2697049 PR:P0DTD2|PRO_0000449657 UniProtKB P0DTD3 ORF14 Uncharacterized protein 14 ORF14 (SARS2)|ORF-14|9c|ORF9c|ORF-9c protein taxon:2697049 PR:P0DTD3|PRO_0000449658 UniProtKB P0DTD8 ORF7b Accessory protein 7b ns7b|ns7b (SARS2)|accessory protein 7b (SARS2)|ORF7b (SARS2)|ORF 7b|protein 7b|ns7b|ns-7b|7b protein taxon:2697049 PR:P0DTD8|PRO_0000449799