|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome |
|
Identification Number: |
doi:10.34622/datarepositorium/WPHMJL |
|
Distributor: |
Repositório de Dados da Universidade do Minho |
|
Date of Distribution: |
2024-02-01 |
|
Version: |
2 |
|
Bibliographic Citation: |
Franco-Duarte, Ricardo; Fernandes, Ticiana; Sousa, Maria João; Sampaio, Paula; Rito, Teresa; Soares, Pedro, 2024, "Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome", https://doi.org/10.34622/datarepositorium/WPHMJL, Repositório de Dados da Universidade do Minho, V2 |
|
Citation |
|
|
Title: |
Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome |
|
Alternative Title: |
Fermentome and Flavorome of non-Saccharomyces yeasts |
|
Identification Number: |
doi:10.34622/datarepositorium/WPHMJL |
|
Authoring Entity: |
Franco-Duarte, Ricardo (CBMA, UMinho) |
|
Fernandes, Ticiana (CBMA, UMinho) |
|
|
Sousa, Maria João (CBMA, UMinho) |
|
|
Sampaio, Paula (CBMA, UMinho) |
|
|
Rito, Teresa (CBMA, UMinho) |
|
|
Soares, Pedro (CBMA, UMinho) |
|
|
Software used in Production: |
SPAdes Genome Assembler |
|
Software used in Production: |
Augustus |
|
Software used in Production: |
BUSCO |
|
Software used in Production: |
QUAST |
|
Software used in Production: |
FasParser |
|
Software used in Production: |
FigTree |
|
Software used in Production: |
iTOL |
|
Software used in Production: |
eggNOG-mapper |
|
Software used in Production: |
kofamKOALA |
|
Software used in Production: |
KAAS-KEEG |
|
Software used in Production: |
dbCAN2 |
|
Software used in Production: |
MINITAB |
|
Software used in Production: |
Orange Data Mining |
|
Distributor: |
Repositório de Dados da Universidade do Minho |
|
Access Authority: |
Franco-Duarte, Ricardo |
|
Depositor: |
Duarte, Ricardo |
|
Date of Deposit: |
2024-01-30 |
|
Study Scope |
|
|
Keywords: |
Computer and Information Science, Earth and Environmental Sciences, non-conventional yeasts, phylogeny, genomics, fermentation, fungi, bioinformatics |
|
Abstract: |
Dataset to support manuscript with the same title. The submitted documents were obtained with the following procedures: For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters. Following, the 661 assembled were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files. |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Sources: |
National Library of Medicine: National Center for Biotechnology Information. Genome. https://www.ncbi.nlm.nih.gov/genome/ |
|
National Library of Medicine: National Center for Biotechnology Information. Sequence Read Archive (SRA) data. https://www.ncbi.nlm.nih.gov/sra/ |
|
|
Data Access |
|
|
Access Authority: |
ricardofilipeduarte@bio.uminho.pt |
|
Citation Requirement: |
Citation of the correspondent paper should be included, if the dataset is used |
|
Notes: |
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a> |
|
Other Study Description Materials |
|
|
Related Publications |
|
|
Citation |
|
|
Bibliographic Citation: |
Phylogenomics and functional annotation of 530 non-Saccharomyces yeasts from winemaking environments reveals their Fermentome and Flavorome. Manuscript submitted for publication. |
|
Label: |
00_Readme.txt |
|
Notes: |
text/plain |
|
Label: |
01_De-novo_assemblies_from_SRA.rar |
|
Text: |
For the genomes downloaded from SRA database (raw data; without assembled version available), assembly was performed using SPAdes Genome Assembler software v.3.15.4 (Bankevich et al. 2012), using default parameters. |
|
Notes: |
application/x-rar-compressed |
|
Label: |
02_Genome_Annotation_Files.rar |
|
Text: |
The 661 assembled genomes were annotated using AUGUSTUS software v.3.4.0 (Stanke and Morgenstern 2005), considering 16 different pre-trained models, chosen as belonging to the Ascomycota phyla (11) or the Basidiomycota phyla (5): Ascomycota – S. cerevisiae S288c, C. albicans, Meyerozyma (Candida) guilliermondii, C. tropicalis, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Lodderomyces elongisporus, Scheffersomyces (Pichia) stipitis, Schizosaccharomyces pombe, and Yarrowia lipolytica; Basidiomycota – Cryptococcus neoformans, Coprinus, Laccaria bicolor, Phanerochaete chrysosporium and Ustilago maydis. Results were manually reviewed to select the most robust annotation in terms of predicted coding genes. The potential coding regions reported by AUGUSTUS were extracted from the complete genomes to FASTA files. |
|
Notes: |
application/x-rar-compressed |
|
Label: |
03_Proteomes.rar |
|
Text: |
A consensus proteome database was prepared by considering the 530 complete genomes (from 134 species) that passed the quality control. |
|
Notes: |
application/x-rar-compressed |
|
Label: |
04_Consensus_KOs.rar |
|
Text: |
Functional genomic annotation was performed using three tools, for increased robustness, as being the three most used tools available for functional annotation: i) eggNOG-mapper v.5.0 (Jensen et al. 2008); ii) kofamKOALA v. 2022-04-03 (Aramaki et al. 2020); iii) KAAS-KEEG Automatic Annotation Server (Moriya et al. 2007). From each software, a list of annotated KO´s (KEEG Orthology), as representing the functional orthologs, was obtained, and an in-house script was built in Python to obtain the list of Consensus KO´s for each genome. The script would search for a consensus annotation in a way that: (i) if only a platform would provide a KO, that one would be considered; (ii) if two platforms would provide a KO and the third a different one, the KO that appeared in the two matching results was selected as consensus; (iii) if three different KO´s were obtained (or two different KO´s and a third with no result), it would be considered undefined. |
|
Notes: |
application/x-rar-compressed |
|
Label: |
05_CAZYmes.rar |
|
Text: |
Gene function predictions were also accomplished by assessing the Carbohydrate-Active EnZymes (CAZymes) database (Cantarel et al. 2009), using dbCAN2 software (Zhang et al. 2018), testing three different annotation tools to increase robustness, HMMER, eCAMI and DIAMOND, and compiling results to obtain the consensus number of CAZymes. |
|
Notes: |
application/x-rar-compressed |
|
Label: |
06_ListOfConsensusProteins.rar |
|
Text: |
List of 1131 proteins established as core, present in all the non-Saccharomyces genomes analysed |
|
Notes: |
application/x-rar-compressed |