The Carbohydrate-Active enZYme (CAZymes) classification system is a database that contains the knowledge of enzymes that are involved in the formation and break down of complex carbohydrates and glycoconjugates. The database classifies each enzyme based on its role.
What are Carbohydrate-Active Enzymes?
As CAZymes act on many biological processes, and they must perform their functions with very high specificity. This creates an extremely higher number of substrates and products that need to be studied, which makes the characterization of them very challenging.
As mentioned above, the CAZyme classification system classifies enzymes into groups based on their role. There are roughly 300 CAZymes, classified into the following categories:
- Glycoside hydrolases
- Glycosyltransferases
- Polysaccharide lyases
- Carbohydrate esterases
- Carbohydrate-binding modules
Glycoside hydrolases (GHs) are responsible for the hydrolysis and/or trans-glycosylation of glycosidic bonds, and they make up 131 CAZyme families. Glycosyltransferases (GTs) are responsible for the biosynthesis of glycosidic bonds from phospho-activated sugar donors and make up 94 of the CAZyme families. Polysaccharide lyases (PLs) cleave the glycosidic bonds of uronic acid-containing polysaccharides and make up 22 families of CAZyme.
Carbohydrate esterases (CEs) remove the ester-based modifications that are in monosaccharides, oligosaccharides, and polysaccharides. By doing this they facilitate the actions of GHs. CEs constitute 16 families of CAZyme.
Carbohydrate-binding modules (CBMs) are proteins that have no enzymatic activity but help with the activity of other CAZymes. They are autonomously folding and functioning protein fragments, roughly 52 families of CAZyme contain a least one CBM module.
Image Credit: Shishir P. S. Chundawat
Contents of the Database
The CAZyme database is constantly updated with information from sequence annotation, family classification, and known functional information. This allows for the study of a CAZyme, all CAZymes in an organism, or a CAZyme family. The data that is constantly added to the database comes from newly released literature, 3D models, and analysis of genomes.
An important feature of the database is that when new information of certain CAZyme families are added or when new families are added, previously released genomes and sequences are reanalyzed to take the new information into account. Only assignments that are based on experimental data are added to the database, so inferred data is not included.
Analysis of Carbohydrate-Active Enzymes
Manual Functional Analysis
Characterisation of new proteins helps with the formation of new protein families. These characterizations are used to estimate the general functions and form descriptions that indicate which proteins are related to new sequences. Functional predictions of CAZymes are achieved by examination of related sequences.
The families of CAZymes can be further broken down into subfamilies to group proteins by specificity, which gives more insight into possible functions of the enzymes. New classifications also provide new information into active sites and active site specificity when comparing the CAZymes. Most discovered subfamilies of CAZyme are monospecific to a certain substrate, and research into subfamilies opens the possibility for further enzyme characterizations.
Large-scale Analysis
Analysis of larger number of sequences is performed using internal CAZyme tools such as semi-automatic modular assignment. A genome analysis begins with the assignment of protein models to one or several CAZyme families depending on their functions. Afterward, predictions of functional classes are made using manual examinations of sequences. Identification of active site residues is the main priority of this step. This categorizes the genome by family and functional classes. Now the genome is analyzed using gene content analysis to give information on how closely different species are related to each other. Differences in gene content give information regarding the diversity and complexity of the processes that use CAZymes and the biology of the organism.
Implications of the Carbohydrate Active Enzyme Classification System
To conclude, the CAZyme classification system is a crucial component of identifying similarities and differences between the enzymes. The characterization by function allows for distinct boundaries for each group to be set. With the database, study of an enzyme, an enzyme family, or the whole organism can be performed, which has vast applications in the world of cellular research.
More research in the CAZyme classification system will discover more families of CAZyme, more subfamilies of CAZyme, and more information on organisms that contain CAZymes. More research will also provide better methods of analyzing CAZymes to provide more details and accurate information.
Further Reading