Jump to content

CCDC130

From Wikipedia, the free encyclopedia

YJU2B
Identifiers
AliasesYJU2B, coiled-coil domain containing 130, CCDC130, YJU2 splicing factor homolog B
External IDsMGI: 1914986; HomoloGene: 12183; GeneCards: YJU2B; OMA:YJU2B - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001294281
NM_026350

RefSeq (protein)

NP_001281210
NP_080626

Location (UCSC)Chr 19: 13.73 – 13.76 MbChr 8: 84.98 – 85 Mb
PubMed search[3][4]
Wikidata
View/Edit HumanView/Edit Mouse

Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids.[5] CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information.[6] GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes.[6]

Function

[edit]

While the specific function of CCDC130 is still unknown, there have been several studies and research papers identifying it as a component of the U5 portion of the U4/U5/U6 tri-snRNP that helps form Complex B of the human spliceosome after coming together with Complex A. Complex B then undergoes more modifications and conformational changes before becoming a mature spliceosome. In one study, the conservation of spliceosomal components is discussed by comparing the human spliceosome with that of yeast. In this study, CCDC130 is categorized as a known splicing factor and its homolog in yeast is Yju2.[7] This yeast protein is a splicing factor that helps form the complete, active spliceosome and promotes the first step of splicing, which involves cleavage at the 5' splice site of the first exon.[7] Based on this information, it is likely that CCDC130 plays a similar role in the human spliceosome, but due to the higher complexity of the human spliceosome, this protein may perform other functions or a completely different function. Due to its high number of phosphorylation sites, it is likely that this protein is activated and recruited to the spliceosomal complex through phosphorylation or dephosphorylation (see Post-translational modifications). Since this gene is ubiquitously expressed and expressed 2.9 times higher than the average gene, it is clear that this protein plays an integral part in the proper function of the spliceosome.[6]

Gene

[edit]

Aliases

[edit]

Coiled-coil domain containing 130 has several aliases, including CCDC130, SB115, LOC81576, and MGC10471.

CCDC130 locus&neighbors

Locus

[edit]

CCDC130 is located on the short arm of chromosome 19 in humans. The exact locus is 19p13.2. The entire gene spans from 13858753-13874106 on the + strand of chromosome 19.[6] CCDC130 is bordered upstream by CACNA1A on the - strand, glatobu, smagly, and socho on the + strand, and downstream by MGC3207, C19orf53, ZSWIM4 on the + strand and joypaw, smeygly, floytobu, smawgly, and wycho on the - strand.[6] Glatobu, smagly, socho, joypaw, smeygly, floytobu, smawgly, and wycho have only been verified by cDNA sequences in GenBank and have no information available about their function. There are also several small genes found within the CCDC130 sequence, with snugly, glytobu, stygly, and glartobu occurring on the + strand and chacho, zoycho, spogly, glotobu, glutobu, and sneygly occurring on the - strand.[6] All of these small genes have extremely low levels of expression (under 3% of the expression of the average gene), with stygly having the highest expression at 2.8% of the average.[6]

Promoter

[edit]

There were several predicted promoters found for CCDC130 using ElDorado from Genomatix, but the promoter that corresponds the closest to the protein sequence is 760 bases and spans from 13858094-13858853 on chromosome 19.[8]

Homology and evolution

[edit]

Paralogs

[edit]

There is only one paralog identified for CCDC130, which is CCDC94, the only other known human member in the CWC16 family of proteins. The two have about 27% identity, most of which is located in the COG5134 domain and at the C-terminus. CCDC94 has three predicted serine phosphorylation sites at positions 213, 220, and 306 that line up with serines in CCDC130 in the multiple sequence alignment and a threonine phosphorylation site that lines up with a phosphorylated serine in CCDC130.[5][9]

An unrooted phylogenetic tree of the human CCDC130, close orthologs, and several distant homologs.

Orthologs and homologs

[edit]

CCDC130 is a highly conserved protein, with true orthologs present in primates, other mammals, amphibians, reptiles, fish, and even invertebrates, such as insects and marine invertebrates. Bird orthologs have not been found in nucleotide or protein BLASTs [6] There have been homologous genes documented in yeasts and other fungi, as well as plants. It is unclear when the most distant homolog of CCDC130 arose, but it was well before the divergence of autotrophs and heterotrophs

Sequence Genus Species Common Name Date of Divergence (mya) Accession # Sequence Length (aa) % Identity
1 Homo sapiens Human N/A NP_110445 396 100
2 Saimiri boliviensis Squirrel monkey 42.6 XP_003941759 396 94
3 Ailuropoda melanoleuca Giant panda 94.2 XP_002921062 392 88
4 Canis lupus familiaris Dog 94.2 XP_542031 397 87
5 Bos taurus Cow 94.2 NP_001069812 400 86
6 Sus scrofa Wild boar 94.2 XP_003123393 398 86
7 Cricetulus griseus Chinese hamster 92.3 XP_003501975 383 79
8 Mus musculus Mouse 92.3 NP_080626 385 78
9 Sarcophilus harrisii Tasmanian devil 162.6 XP_003760711 367 77
10 Anolis carolinensis Anole lizard 296 XP_003216443 373 62
11 Xenopus laevis African clawed frog 371.2 NP_001086365 384 61
12 Danio rerio Zebrafish 400.1 NP_991158 390 56
13 Takifugu rubripes Pufferfish 400.1 XP_003972319 379 65
14 Amphimedon queenslandica Sponge 716.5 XP_003388671 299 46
15 Culex quinquefasciatus Mosquito 782.7 XP_001846118 329 53
16 Bombus impatiens Bumblebee 782.7 XP_003485202 314 55
17 Caenorhabditis remanei Nematode 937.5 XP_003094402 365 44
18 Schizosaccharomyces pombe Yeast 1215.8 NP_595734.2 294 27
19 Cucumis sativus Cucumber 1369 XP_004135117 313 47

Conserved regions

[edit]

CCDC130 has two conserved domains and a coiled-coil region. The first is the COG5134 domain which is found to be conserved in cucumbers and likely plays a role in the function of the protein because it is always the most highly conserved region in any multiple sequence alignment.[5] It spans approximately the first 170 amino acids of the protein. The other domain is the DUF572 domain, which is a eukaryotic domain of unknown function that is shared by all of the orthologs and a majority of the more distant homologs. This domain doesn't have a defined range, as different sources have reported different lengths, some saying that it is the entire protein. The coiled-coil region is from 182-214 in the human protein and is rich in charged amino acids. The modified residues are also very well conserved.

Protein

[edit]

The most abundant variant of CCDC130 is encoded by the second longest open reading frame (ORF), corresponding to a 396 amino acid protein with a molecular weight of 44.8 kDa and an isoelectric point of 8.252.[5] The CCDC130 protein is rich in charged amino acids and deficient in uncharged, non-polar amino acids.[5] Mobyle @ Pasteur predicted CCDC130 to be extremely hydrophilic due to the large numbers of charged and polar amino acids, with no site scoring above zero on the hydrophobicity graph and some sites reaching as low as -6 (F180). There is a region in the coiled coil domain (182-214) in which 14 of 18 amino acids are charged. SAPS analysis predicted that this protein would be unstable.[5] Due to its high hydrophilicity, this protein definitely does not contain transmembrane segments.

Protein sequence for the major form of CCDC130. This peptide is 396 amino acids long, contains a coiled-coil region from 182-214, a domain of unknown function (DUF572), and the COG5134 domain in the N-terminal half
Alternative splicing variants of the CCDC130 protein as seen on AceView from NCBI.

Variation

[edit]

There are 17 different mRNAS produced from the CCDC130 gene. 13 of these mRNAs come from alternative splicing, and the other four are unspliced.[6] There have been four alternative promoters, five alternative polyadenylation sites, and four alternative last exons described.[6] Two instances of intron retention have been described. 14 different proteins have been identified from the CCDC130 gene, all of which contain the DUF572 domain but only five seem to show the coiled-coil stretch. The other three mRNAs were very low quality and were not translated. It was also noted that this gene has the potential to encode several non-overlapping proteins. 45 SNPs have been documented for CCDC130 on NCBI: 29 missense mutations and 16 synonymous mutations that don't change the amino acid.[6]

Conceptual translation of amino acids 269-396 and 3' UTR of CCDC130. Blue markings indicate phosphorylation sites and pink letters indicate charged amino acids.

Post-translational modifications

[edit]

CCDC130 is a heavily phosphorylated protein, with 31 different phosphorylation sites predicted by NetPhos and 26 of those 31 being located in the C-terminal half of the protein.[9] 17 of 22 serines, 4 of 6 threonines, and 2 of 3 tyrosines predicted had probability scores over .800, indicating a high likelihood that they are true phosphorylation sites.[9] There were six sumoylation sites predicted, but only one of these sites (K177) had a probability score of higher than .500, at .640.[10] The physiological function of sumoylation is still relatively mysterious, but this modification can add a substantial amount of molecular weight onto a protein (11 kDa). 13 glycation sites with probability scores over .500 were predicted, and 10 of the 13 glycated lysines occur in the N-terminal half of the protein.[11] NetOGlyc predicted 11 possible O-glycosylationsites with probability scores over .500, with all 11 occurring in a 64 amino acid span running from T313 to T376.[12] Several of these sites were predicted as both phosphorylation sites and O-glycosylation sites. CCDC130 was not predicted to be sulfated,[13] acetylated,[14] myristoylated,[15] N-glycosylated,[16] C-mannosylated,[17] or undergo any GPI modification.[18]

Secondary structure

[edit]

There is a long alpha helix sequence predicted in CCDC130 that spans from R121-A211 that was predicted by YASPIN. Other programs for secondary structure analysis, such as PELE, CHOFAS, and SABLE, also predicted alpha helices of varying lengths in this region.[5][19] There were no consistent predictions for beta sheets in CCDC130.

Interaction information

[edit]

There are several proteins listed that interact with CCDC130, including EEF1A1, NINL, TRAF2, ZBTB16, ZNF165, and ZNF24. EEF1A1 is a eukaryotic elongation factor that is involved in the binding of aminoacyl-tRNA to the A-site of ribosomes during translation.[20] NINL is a ninein-like protein that is involved in microtubule organization and has calcium ion binding activity.[20] TRAF2, tumor necrosis factor (TNF) receptor associated factor 2, is part of some E3 ubiquitin ligase complexes and is involved in ubiquitinating proteins so they can get degraded by the proteasome.[20] ZBTB16, zinc finger and BTB domain-containing protein 16, is also part of the E3 ubiquitin ligase complex and is most likely involved in substrate recognition. There is also an alternate form of CCDC130 where only 803 bases are transcribed instead of 1433 bases, but there is no additional information provided.[21] ZNF165 and ZNF24 are both zinc finger proteins, which bind DNA and other proteins to regulate transcription. Below is a table of the interacting proteins for CCDC130 assembled by GeneCards.[21] The interactions of CCDC130 with NINL, ZNF24, TRAF2, JUP, GATA5 have been verified by a two-hybrid screen according to STRING, so these interactions do occur. JUP is a plaque protein. GATA5 is a transcription factor that helps activate the promoter for lactase-phlorizin hydrolase.[21] Interactions with CDA, DERA, CDC40, NAA25, DGCR14, NAA20, and PRPF19 have not been verified experimentally, but interactions between gene homologs have been documented in other species according to STRING so these interactions could potentially occur. ZBTB16, EEF1A1, and ZNF165 all have been verified by at least one two-hybrid screen according to MINT. NAT9 was described as a known interactant on I2D. In a study done at the University of the District of Columbia to characterize CCDC130, they have found that it is induced through insulin signaling, is targeted by three different kinases (GSK3, CK1, and CK2), and is a mitochondrial protein.5 The study also shows that CCDC130 can potentially be used as a biomarker for certain types of cancer due to its differential expression in cancer cells. The study specifically mentions that CCDC130 is downregulated in some types of colon cancer, which allowed more cancer cells to be untargeted by the apoptosis pathway.

Expression

[edit]

CCDC130 is a ubiquitously expressed protein, showing some expression level in all tissue and cell samples analyzed. The AceView profile for CCDC130 shows expression levels 2.9 times higher than the average protein.[6] The level of expression varies greatly between tissues, but there is at least some level of expression in every sample. According to NCBI GEO profiles and BioGPS data, the fetal thyroid, adrenal cortex, uterus, prostate, testes, seminiferous tubule, heart, PB-CD4+ T cells, PB-CD8+ T cells, lymph node, lung, thymus, thyroid, leukemia chronic myelogenous K562, and leukemia lymphoblastic molt4 samples all had at expression levels above the 75th percentile for gene expression in at least one of two samples. Gene expression was lower than the 25th percentile in at least one of two samples for cerebellum peduncles, occipital lobe, pons, trigeminal ganglion, subthalamic nucleus, superior cervical ganglion (drastically different expression levels), dorsal root ganglion, fetal liver, uterus corpus, atrioventricular node, appendix, skeletal muscle, cardiac myocytes, tongue, and salivary gland. PB-CD8+ T cells had the highest relative CCDC130 expression and the tongue had the lowest relative expression. For more information about CCDC130 expression, see mouse brain expression data or human brain microarray data from Allen Brain Atlas or differential expression in GEO profiles from NCBI.[6]

Medical information

[edit]

CCDC130 has shown to be differentially expressed in several cancers, including breast, colon, and pancreatic through microarray studies of cancer cells.[22] It was shown to be down-regulated in colon cancers, suggesting that it could be a biomarker for cancers. There is still research being done on this topic to confirm its function as a cancer identifier. Many websites also say that it is involved in the cell's response to viral infection, but there is no specific information on this nor any elaboration.

References

[edit]
  1. ^ a b c GRCh38: Ensembl release 89: ENSG00000104957Ensembl, May 2017
  2. ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000004994Ensembl, May 2017
  3. ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. ^ a b c d e f g "CCDC130 Analysis". Biology Workbench. San Diego Supercomputing Center- University of California San Diego. Retrieved 7 May 2013.[permanent dead link]
  6. ^ a b c d e f g h i j k l m "NCBI". National Library of Medicine. Retrieved 2 April 2013.
  7. ^ a b Fabrizio P, Dannenberg J, Dube P, Kastner B, Stark H, Urlaub H, Lührmann R (November 2009). "The evolutionarily conserved core design of the catalytic activation step of the yeast spliceosome". Molecular Cell. 36 (4): 593–608. doi:10.1016/j.molcel.2009.09.040. hdl:11858/00-001M-0000-0010-9378-C. PMID 19941820.
  8. ^ "El Dorado". Genomatix Software GmbH. Archived from the original on 2 December 2021. Retrieved 11 April 2013.
  9. ^ a b c Blom N, Gammeltoft S, Brunak S (1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID 10600390.
  10. ^ "SUMOplot Analysis Program". Abgent- a WuXi AppTec Company. Retrieved 14 May 2013.
  11. ^ Johansen MB, Kiemer L, Brunak S (2006). "Analysis and prediction of mammalian protein glycation". Glycobiology. 16 (9): 844–53. CiteSeerX 10.1.1.128.831. doi:10.1093/glycob/cwl009. PMID 16762979.
  12. ^ Julenius K, Mølgaard A, Gupta R, Brunak S (2005). "Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites". Glycobiology. 15 (2): 153–64. doi:10.1093/glycob/cwh151. PMID 15385431.
  13. ^ Monigatti F, Gasteiger E, Bairoch A, Jung E (2002). "The Sulfinator: predicting tyrosine sulfation sites in protein sequences". Bioinformatics. 18 (5): 769–70. doi:10.1093/bioinformatics/18.5.769. PMID 12050077.
  14. ^ Kiemer L, Bendtsen JD, Blom N (2005). "NetAcet: prediction of N-terminal acetylation sites". Bioinformatics. 21 (7): 1269–70. doi:10.1093/bioinformatics/bti130. PMID 15539450.
  15. ^ Bologna G, Yvon C, Duvaud S, Veuthey AL (2004). "N-Terminal myristoylation predictions by ensembles of neural networks". Proteomics. 4 (6): 1626–32. doi:10.1002/pmic.200300783. PMID 15174132. S2CID 20289352.
  16. ^ R. Gupta, E. Jung, S. Brunak (2004). "Prediction of N-glycosylation sites in human proteins". NetNGlyc1.0. Center for Biological Sequence Analysis- University of Denmark. Retrieved 13 May 2013.
  17. ^ Julenius K (2007). "NetCGlyc 1.0: prediction of mammalian C-mannosylation sites". Glycobiology. 17 (8): 868–76. doi:10.1093/glycob/cwm050. PMID 17494086.
  18. ^ "big-PI Predictor". GPI Lipid Anchor Project- I.M.P. Bioinformatics. Archived from the original on 21 July 2020. Retrieved 14 May 2013.
  19. ^ "SABLE Secondary Structure Prediction". Cincinnati Children's Hospital Medical Center. Retrieved 14 May 2013.
  20. ^ a b c "NextProt". CCDC130 interacting proteins. Swiss Institute of Bioinformatics. Retrieved 14 May 2013.
  21. ^ a b c "GeneCards". Weizmann Institute of Science. Retrieved 14 May 2013.
  22. ^ Wang Y, Sun G, Ji Z, Xing C, Liang Y (20 January 2012). "Weighted change-point method for detecting differential gene expression in breast cancer microarray data". PLOS ONE. 7 (1): e29860. Bibcode:2012PLoSO...729860W. doi:10.1371/journal.pone.0029860. PMC 3262809. PMID 22276133.

Further reading

[edit]
[edit]