<?xml version='1.0' encoding='UTF-8'?><codeBook xmlns="ddi:codebook:2_5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:codebook:2_5 https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" version="2.5"><docDscr><citation><titlStmt><titl>Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome</titl><IDNo agency="DOI">doi:10.34622/datarepositorium/WPHMJL</IDNo></titlStmt><distStmt><distrbtr source="archive">Repositório de Dados da Universidade do Minho</distrbtr><distDate>2024-02-01</distDate></distStmt><verStmt source="DVN"><version date="2024-04-04" type="RELEASED">2</version></verStmt><biblCit>Franco-Duarte, Ricardo; Fernandes, Ticiana; Sousa, Maria João; Sampaio, Paula; Rito, Teresa; Soares, Pedro, 2024, "Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome", https://doi.org/10.34622/datarepositorium/WPHMJL, Repositório de Dados da Universidade do Minho, V2</biblCit></citation></docDscr><stdyDscr><citation><titlStmt><titl>Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome</titl><altTitl>Fermentome and Flavorome of non-Saccharomyces yeasts</altTitl><IDNo agency="DOI">doi:10.34622/datarepositorium/WPHMJL</IDNo></titlStmt><rspStmt><AuthEnty affiliation="CBMA, UMinho">Franco-Duarte, Ricardo</AuthEnty><AuthEnty affiliation="CBMA, UMinho">Fernandes, Ticiana</AuthEnty><AuthEnty affiliation="CBMA, UMinho">Sousa, Maria João</AuthEnty><AuthEnty affiliation="CBMA, UMinho">Sampaio, Paula</AuthEnty><AuthEnty affiliation="CBMA, UMinho">Rito, Teresa</AuthEnty><AuthEnty affiliation="CBMA, UMinho">Soares, Pedro</AuthEnty></rspStmt><prodStmt><software version="3.15.4">SPAdes Genome Assembler</software><software version="3.4.0">Augustus</software><software version="5.0">BUSCO</software><software version="5.0.2">QUAST</software><software version="2.13.0">FasParser</software><software version="1.4.4">FigTree</software><software version="1.4.4">iTOL</software><software version="5.0">eggNOG-mapper</software><software version="2022.04.03">kofamKOALA</software><software version="2022.04.03">KAAS-KEEG</software><software version="2022.04.03">dbCAN2</software><software version="19.2020">MINITAB</software><software version="3.36.2">Orange Data Mining</software></prodStmt><distStmt><distrbtr source="archive">Repositório de Dados da Universidade do Minho</distrbtr><contact affiliation="CBMA, UMinho">Franco-Duarte, Ricardo</contact><depositr>Duarte, Ricardo</depositr><depDate>2024-01-30</depDate></distStmt></citation><stdyInfo><subject><keyword>Computer and Information Science</keyword><keyword>Earth and Environmental Sciences</keyword><keyword>non-conventional yeasts</keyword><keyword>phylogeny</keyword><keyword>genomics</keyword><keyword>fermentation</keyword><keyword>fungi</keyword><keyword>bioinformatics</keyword></subject><abstract date="2024-01-31">Dataset to support manuscript with the same title.&#xd;
&#xd;
The submitted documents were obtained with the following procedures:&#xd;
For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters.&#xd;
Following, the 661 assembled were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files.</abstract><sumDscr/></stdyInfo><method><dataColl><sources><dataSrc>National Library of Medicine: National Center for Biotechnology Information. Genome. https://www.ncbi.nlm.nih.gov/genome/</dataSrc><dataSrc>National Library of Medicine: National Center for Biotechnology Information. Sequence Read Archive (SRA) data. https://www.ncbi.nlm.nih.gov/sra/</dataSrc></sources></dataColl><anlyInfo/></method><dataAccs><notes type="DVN:TOU" level="dv">&lt;a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">&lt;img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" />&lt;/a>&lt;br />This work is licensed under a &lt;a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License&lt;/a></notes><setAvail/><useStmt><contact>ricardofilipeduarte@bio.uminho.pt</contact><citReq>Citation of the correspondent paper should be included, if the dataset is used</citReq></useStmt></dataAccs><othrStdyMat><relPubl><citation><biblCit>Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome. Manuscript submitted for publication.</biblCit></citation></relPubl></othrStdyMat></stdyDscr><otherMat ID="f3624" URI="https://datarepositorium.uminho.pt/api/access/datafile/3624" level="datafile"><labl>00_Readme.txt</labl><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">text/plain</notes></otherMat><otherMat ID="f3621" URI="https://datarepositorium.uminho.pt/api/access/datafile/3621" level="datafile"><labl>01_De-novo_assemblies_from_SRA.rar</labl><txt>For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters.</txt><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">application/x-rar-compressed</notes></otherMat><otherMat ID="f3623" URI="https://datarepositorium.uminho.pt/api/access/datafile/3623" level="datafile"><labl>02_Genome_Annotation_Files.rar</labl><txt>The 661 assembled genomes were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files.</txt><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">application/x-rar-compressed</notes></otherMat><otherMat ID="f3620" URI="https://datarepositorium.uminho.pt/api/access/datafile/3620" level="datafile"><labl>03_Proteomes.rar</labl><txt>A consensus proteome database was prepared by considering the 530 complete genomes (from 134 species) that passed the quality control. </txt><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">application/x-rar-compressed</notes></otherMat><otherMat ID="f3622" URI="https://datarepositorium.uminho.pt/api/access/datafile/3622" level="datafile"><labl>04_Consensus_KOs.rar</labl><txt>Functional genomic annotation was performed using three tools, for increased robustness, as being the three most used tools available for functional annotation: i) eggNOG-mapper v.5.0 (Jensen et al. 2008); ii) kofamKOALA v. 2022-04-03 (Aramaki et al. 2020); iii) KAAS-KEEG Automatic Annotation Server (Moriya et al. 2007). From each software, a list of annotated KO´s (KEEG Orthology), as representing the functional orthologs, was obtained, and an in-house script was built in Python to obtain the list of Consensus KO´s for each genome. The script would search for a consensus annotation in a way that: (i) if only a platform would provide a KO, that one would be considered; (ii) if two platforms would provide a KO and the third a different one, the KO that appeared in the two matching results was selected as consensus; (iii) if three different KO´s were obtained (or two different KO´s and a third with no result), it would be considered undefined. </txt><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">application/x-rar-compressed</notes></otherMat><otherMat ID="f3625" URI="https://datarepositorium.uminho.pt/api/access/datafile/3625" level="datafile"><labl>05_CAZYmes.rar</labl><txt>Gene function predictions were also accomplished by assessing the Carbohydrate-Active EnZymes (CAZymes) database (Cantarel et al. 2009), using dbCAN2 software (Zhang et al. 2018), testing three different annotation tools to increase robustness, HMMER, eCAMI and DIAMOND, and compiling results to obtain the consensus number of CAZymes.</txt><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">application/x-rar-compressed</notes></otherMat><otherMat ID="f3747" URI="https://datarepositorium.uminho.pt/api/access/datafile/3747" level="datafile"><labl>06_ListOfConsensusProteins.rar</labl><txt>List of 1131 proteins established as core, present in all the non-Saccharomyces genomes analysed</txt><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">application/x-rar-compressed</notes></otherMat></codeBook>