Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome (ICPSR doi:10.34622/datarepositorium/WPHMJL)
(Fermentome and Flavorome of non-Saccharomyces yeasts)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome

Identification Number:

doi:10.34622/datarepositorium/WPHMJL

Distributor:

Repositório de Dados da Universidade do Minho

Date of Distribution:

2024-02-01

Version:

2

Bibliographic Citation:

Franco-Duarte, Ricardo; Fernandes, Ticiana; Sousa, Maria João; Sampaio, Paula; Rito, Teresa; Soares, Pedro, 2024, "Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome", https://doi.org/10.34622/datarepositorium/WPHMJL, Repositório de Dados da Universidade do Minho, V2

Study Description

Citation

Title:

Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome

Alternative Title:

Fermentome and Flavorome of non-Saccharomyces yeasts

Identification Number:

doi:10.34622/datarepositorium/WPHMJL

Authoring Entity:

Franco-Duarte, Ricardo (CBMA, UMinho)

Fernandes, Ticiana (CBMA, UMinho)

Sousa, Maria João (CBMA, UMinho)

Sampaio, Paula (CBMA, UMinho)

Rito, Teresa (CBMA, UMinho)

Soares, Pedro (CBMA, UMinho)

Software used in Production:

SPAdes Genome Assembler

Software used in Production:

Augustus

Software used in Production:

BUSCO

Software used in Production:

QUAST

Software used in Production:

FasParser

Software used in Production:

FigTree

Software used in Production:

iTOL

Software used in Production:

eggNOG-mapper

Software used in Production:

kofamKOALA

Software used in Production:

KAAS-KEEG

Software used in Production:

dbCAN2

Software used in Production:

MINITAB

Software used in Production:

Orange Data Mining

Distributor:

Repositório de Dados da Universidade do Minho

Access Authority:

Franco-Duarte, Ricardo

Depositor:

Duarte, Ricardo

Date of Deposit:

2024-01-30

Study Scope

Keywords:

Computer and Information Science, Earth and Environmental Sciences, non-conventional yeasts, phylogeny, genomics, fermentation, fungi, bioinformatics

Abstract:

Dataset to support manuscript with the same title. The submitted documents were obtained with the following procedures: For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters. Following, the 661 assembled were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files.

Methodology and Processing

Sources Statement

Data Sources:

National Library of Medicine: National Center for Biotechnology Information. Genome. https://www.ncbi.nlm.nih.gov/genome/

National Library of Medicine: National Center for Biotechnology Information. Sequence Read Archive (SRA) data. https://www.ncbi.nlm.nih.gov/sra/

Data Access

Access Authority:

ricardofilipeduarte@bio.uminho.pt

Citation Requirement:

Citation of the correspondent paper should be included, if the dataset is used

Notes:

<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>

Other Study Description Materials

Related Publications

Citation

Bibliographic Citation:

Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome. Manuscript submitted for publication.

Other Study-Related Materials

Label:

00_Readme.txt

Notes:

text/plain

Other Study-Related Materials

Label:

01_De-novo_assemblies_from_SRA.rar

Text:

For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters.

Notes:

application/x-rar-compressed

Other Study-Related Materials

Label:

02_Genome_Annotation_Files.rar

Text:

The 661 assembled genomes were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files.

Notes:

application/x-rar-compressed

Other Study-Related Materials

Label:

03_Proteomes.rar

Text:

A consensus proteome database was prepared by considering the 530 complete genomes (from 134 species) that passed the quality control.

Notes:

application/x-rar-compressed

Other Study-Related Materials

Label:

04_Consensus_KOs.rar

Text:

Functional genomic annotation was performed using three tools, for increased robustness, as being the three most used tools available for functional annotation: i) eggNOG-mapper v.5.0 (Jensen et al. 2008); ii) kofamKOALA v. 2022-04-03 (Aramaki et al. 2020); iii) KAAS-KEEG Automatic Annotation Server (Moriya et al. 2007). From each software, a list of annotated KO´s (KEEG Orthology), as representing the functional orthologs, was obtained, and an in-house script was built in Python to obtain the list of Consensus KO´s for each genome. The script would search for a consensus annotation in a way that: (i) if only a platform would provide a KO, that one would be considered; (ii) if two platforms would provide a KO and the third a different one, the KO that appeared in the two matching results was selected as consensus; (iii) if three different KO´s were obtained (or two different KO´s and a third with no result), it would be considered undefined.

Notes:

application/x-rar-compressed

Other Study-Related Materials

Label:

05_CAZYmes.rar

Text:

Gene function predictions were also accomplished by assessing the Carbohydrate-Active EnZymes (CAZymes) database (Cantarel et al. 2009), using dbCAN2 software (Zhang et al. 2018), testing three different annotation tools to increase robustness, HMMER, eCAMI and DIAMOND, and compiling results to obtain the consensus number of CAZymes.

Notes:

application/x-rar-compressed

Other Study-Related Materials

Label:

06_ListOfConsensusProteins.rar

Text:

List of 1131 proteins established as core, present in all the non-Saccharomyces genomes analysed

Notes:

application/x-rar-compressed