{"id":3032,"identifier":"datarepositorium/WPHMJL","persistentUrl":"https://doi.org/10.34622/datarepositorium/WPHMJL","protocol":"doi","authority":"10.34622","publisher":"Repositório de Dados da Universidade do Minho","publicationDate":"2024-02-01","storageIdentifier":"file://10.34622/datarepositorium/WPHMJL","datasetVersion":{"id":534,"datasetId":3032,"datasetPersistentId":"doi:10.34622/datarepositorium/WPHMJL","storageIdentifier":"file://10.34622/datarepositorium/WPHMJL","versionNumber":2,"versionMinorNumber":1,"versionState":"RELEASED","lastUpdateTime":"2024-04-04T09:21:29Z","releaseTime":"2024-04-04T09:21:29Z","createTime":"2024-02-22T16:38:16Z","license":"NONE","termsOfUse":"<a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc/4.0/\"><img alt=\"Creative Commons Licence\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc/4.0/88x31.png\" /></a><br />This work is licensed under a <a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc/4.0/\">Creative Commons Attribution-NonCommercial 4.0 International License</a>","citationRequirements":"Citation of the correspondent paper should be included, if the dataset is used","contactForAccess":"ricardofilipeduarte@bio.uminho.pt","fileAccessRequest":true,"metadataBlocks":{"citation":{"displayName":"Citation Metadata","fields":[{"typeName":"title","multiple":false,"typeClass":"primitive","value":"Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome"},{"typeName":"alternativeTitle","multiple":false,"typeClass":"primitive","value":"Fermentome and Flavorome of non-Saccharomyces yeasts"},{"typeName":"author","multiple":true,"typeClass":"compound","value":[{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Franco-Duarte, Ricardo"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"CBMA, UMinho"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0002-2333-6127"}},{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Fernandes, Ticiana"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"CBMA, UMinho"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0003-3736-3967"}},{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Sousa, Maria João"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"CBMA, UMinho"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0001-9424-4150"}},{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Sampaio, Paula"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"CBMA, UMinho"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0002-1415-4428"}},{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Rito, Teresa"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"CBMA, UMinho"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0002-8374-6347"}},{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Soares, Pedro"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"CBMA, UMinho"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0002-2807-690X"}}]},{"typeName":"datasetContact","multiple":true,"typeClass":"compound","value":[{"datasetContactName":{"typeName":"datasetContactName","multiple":false,"typeClass":"primitive","value":"Franco-Duarte, Ricardo"},"datasetContactAffiliation":{"typeName":"datasetContactAffiliation","multiple":false,"typeClass":"primitive","value":"CBMA, UMinho"}}]},{"typeName":"dsDescription","multiple":true,"typeClass":"compound","value":[{"dsDescriptionValue":{"typeName":"dsDescriptionValue","multiple":false,"typeClass":"primitive","value":"Dataset to support manuscript with the same title.\r\n\r\nThe submitted documents were obtained with the following procedures:\r\nFor the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters.\r\nFollowing, the 661 assembled were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files."},"dsDescriptionDate":{"typeName":"dsDescriptionDate","multiple":false,"typeClass":"primitive","value":"2024-01-31"}}]},{"typeName":"subject","multiple":true,"typeClass":"controlledVocabulary","value":["Computer and Information Science","Earth and Environmental Sciences"]},{"typeName":"keyword","multiple":true,"typeClass":"compound","value":[{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"non-conventional yeasts"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"phylogeny"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"genomics"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"fermentation"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"fungi"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"bioinformatics"}}]},{"typeName":"publication","multiple":true,"typeClass":"compound","value":[{"publicationCitation":{"typeName":"publicationCitation","multiple":false,"typeClass":"primitive","value":"Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome. Manuscript submitted for publication."}}]},{"typeName":"depositor","multiple":false,"typeClass":"primitive","value":"Duarte, Ricardo"},{"typeName":"dateOfDeposit","multiple":false,"typeClass":"primitive","value":"2024-01-30"},{"typeName":"software","multiple":true,"typeClass":"compound","value":[{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"SPAdes Genome Assembler"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"3.15.4"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"Augustus"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"3.4.0"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"BUSCO"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"5.0"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"QUAST"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"5.0.2"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"FasParser"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"2.13.0"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"FigTree"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"1.4.4"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"iTOL"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"eggNOG-mapper"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"5.0"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"kofamKOALA"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"2022.04.03"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"KAAS-KEEG"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"dbCAN2"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"MINITAB"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"19.2020"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"Orange Data Mining"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"3.36.2"}}]},{"typeName":"dataSources","multiple":true,"typeClass":"primitive","value":["National Library of Medicine: National Center for Biotechnology Information. Genome. https://www.ncbi.nlm.nih.gov/genome/","National Library of Medicine: National Center for Biotechnology Information. Sequence Read Archive (SRA) data. https://www.ncbi.nlm.nih.gov/sra/"]}]}},"files":[{"label":"00_Readme.txt","restricted":false,"version":1,"datasetVersionId":534,"dataFile":{"id":3624,"persistentId":"","pidURL":"","filename":"00_Readme.txt","contentType":"text/plain","filesize":3425,"storageIdentifier":"file://18d6388b86c-94c21299c474","rootDataFileId":-1,"md5":"9bcd61f1318ea91f1dae36a2463b5c4c","checksum":{"type":"MD5","value":"9bcd61f1318ea91f1dae36a2463b5c4c"},"creationDate":"2024-02-01"}},{"description":"For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters.","label":"01_De-novo_assemblies_from_SRA.rar","restricted":true,"version":1,"datasetVersionId":534,"dataFile":{"id":3621,"persistentId":"","pidURL":"","filename":"01_De-novo_assemblies_from_SRA.rar","contentType":"application/x-rar-compressed","filesize":313629415,"description":"For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters.","storageIdentifier":"file://18d5abdec32-2ced891448d7","rootDataFileId":-1,"md5":"ae7edef5e9e9fd788d95cbcd7cb30374","checksum":{"type":"MD5","value":"ae7edef5e9e9fd788d95cbcd7cb30374"},"creationDate":"2024-01-30"}},{"description":"The 661 assembled genomes were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files.","label":"02_Genome_Annotation_Files.rar","restricted":true,"version":1,"datasetVersionId":534,"dataFile":{"id":3623,"persistentId":"","pidURL":"","filename":"02_Genome_Annotation_Files.rar","contentType":"application/x-rar-compressed","filesize":1256706244,"description":"The 661 assembled genomes were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files.","storageIdentifier":"file://18d5ad81ce6-bd5fc718f9ca","rootDataFileId":-1,"md5":"7cd4ab27f7ed6e9c56b22b118fa8cb37","checksum":{"type":"MD5","value":"7cd4ab27f7ed6e9c56b22b118fa8cb37"},"creationDate":"2024-01-30"}},{"description":"A consensus proteome database was prepared by considering the 530 complete genomes (from 134 species) that passed the quality control. ","label":"03_Proteomes.rar","restricted":true,"version":1,"datasetVersionId":534,"dataFile":{"id":3620,"persistentId":"","pidURL":"","filename":"03_Proteomes.rar","contentType":"application/x-rar-compressed","filesize":932639677,"description":"A consensus proteome database was prepared by considering the 530 complete genomes (from 134 species) that passed the quality control. ","storageIdentifier":"file://18d5a9fdaa2-0fc50381e545","rootDataFileId":-1,"md5":"90a5a5170220edf89555cc8702e0ee97","checksum":{"type":"MD5","value":"90a5a5170220edf89555cc8702e0ee97"},"creationDate":"2024-01-30"}},{"description":"Functional genomic annotation was performed using three tools, for increased robustness, as being the three most used tools available for functional annotation: i) eggNOG-mapper v.5.0 (Jensen et al. 2008); ii) kofamKOALA v. 2022-04-03 (Aramaki et al. 2020); iii) KAAS-KEEG Automatic Annotation Server (Moriya et al. 2007). From each software, a list of annotated KO´s (KEEG Orthology), as representing the functional orthologs, was obtained, and an in-house script was built in Python to obtain the list of Consensus KO´s for each genome. The script would search for a consensus annotation in a way that: (i) if only a platform would provide a KO, that one would be considered; (ii) if two platforms would provide a KO and the third a different one, the KO that appeared in the two matching results was selected as consensus; (iii) if three different KO´s were obtained (or two different KO´s and a third with no result), it would be considered undefined. ","label":"04_Consensus_KOs.rar","restricted":true,"version":1,"datasetVersionId":534,"dataFile":{"id":3622,"persistentId":"","pidURL":"","filename":"04_Consensus_KOs.rar","contentType":"application/x-rar-compressed","filesize":12761738,"description":"Functional genomic annotation was performed using three tools, for increased robustness, as being the three most used tools available for functional annotation: i) eggNOG-mapper v.5.0 (Jensen et al. 2008); ii) kofamKOALA v. 2022-04-03 (Aramaki et al. 2020); iii) KAAS-KEEG Automatic Annotation Server (Moriya et al. 2007). From each software, a list of annotated KO´s (KEEG Orthology), as representing the functional orthologs, was obtained, and an in-house script was built in Python to obtain the list of Consensus KO´s for each genome. The script would search for a consensus annotation in a way that: (i) if only a platform would provide a KO, that one would be considered; (ii) if two platforms would provide a KO and the third a different one, the KO that appeared in the two matching results was selected as consensus; (iii) if three different KO´s were obtained (or two different KO´s and a third with no result), it would be considered undefined. ","storageIdentifier":"file://18d5acc8d26-273c91646e0b","rootDataFileId":-1,"md5":"89d3865b483ca694fbb6efece0f3c6f3","checksum":{"type":"MD5","value":"89d3865b483ca694fbb6efece0f3c6f3"},"creationDate":"2024-01-30"}},{"description":"Gene function predictions were also accomplished by assessing the Carbohydrate-Active EnZymes (CAZymes) database (Cantarel et al. 2009), using dbCAN2 software (Zhang et al. 2018), testing three different annotation tools to increase robustness, HMMER, eCAMI and DIAMOND, and compiling results to obtain the consensus number of CAZymes.","label":"05_CAZYmes.rar","restricted":true,"version":1,"datasetVersionId":534,"dataFile":{"id":3625,"persistentId":"","pidURL":"","filename":"05_CAZYmes.rar","contentType":"application/x-rar-compressed","filesize":1640754,"description":"Gene function predictions were also accomplished by assessing the Carbohydrate-Active EnZymes (CAZymes) database (Cantarel et al. 2009), using dbCAN2 software (Zhang et al. 2018), testing three different annotation tools to increase robustness, HMMER, eCAMI and DIAMOND, and compiling results to obtain the consensus number of CAZymes.","storageIdentifier":"file://18d6388b6cc-ab3b43e289bf","rootDataFileId":-1,"md5":"b45a83c6a10236349ca433765fff1c80","checksum":{"type":"MD5","value":"b45a83c6a10236349ca433765fff1c80"},"creationDate":"2024-02-01"}},{"description":"List of 1131 proteins established as core, present in all the non-Saccharomyces genomes analysed","label":"06_ListOfConsensusProteins.rar","restricted":true,"version":1,"datasetVersionId":534,"dataFile":{"id":3747,"persistentId":"","pidURL":"","filename":"06_ListOfConsensusProteins.rar","contentType":"application/x-rar-compressed","filesize":2571,"description":"List of 1131 proteins established as core, present in all the non-Saccharomyces genomes analysed","storageIdentifier":"file://18dc6f9912e-4ad720d4486c","rootDataFileId":-1,"md5":"62b356b6a2fe8012385ee8436b75158b","checksum":{"type":"MD5","value":"62b356b6a2fe8012385ee8436b75158b"},"creationDate":"2024-02-20"}}],"citation":"Franco-Duarte, Ricardo; Fernandes, Ticiana; Sousa, Maria João; Sampaio, Paula; Rito, Teresa; Soares, Pedro, 2024, \"Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome\", https://doi.org/10.34622/datarepositorium/WPHMJL, Repositório de Dados da Universidade do Minho, V2"}}