View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Code and data for CFreeEnS |
Identification Number: |
doi:10.21979/N9/4YDZED |
Distributor: |
DR-NTU (Data) |
Date of Distribution: |
2019-04-03 |
Version: |
1 |
Bibliographic Citation: |
Zhou, Xinrui, 2019, "Code and data for CFreeEnS", https://doi.org/10.21979/N9/4YDZED, DR-NTU (Data), V1 |
Citation |
|
Title: |
Code and data for CFreeEnS |
Identification Number: |
doi:10.21979/N9/4YDZED |
Authoring Entity: |
Zhou, Xinrui (Nanyang Technological University) |
Software used in Production: |
python |
Distributor: |
DR-NTU (Data) |
Access Authority: |
Zhou Xinrui |
Depositor: |
Zhou Xinrui |
Date of Deposit: |
2019-04-03 |
Holdings Information: |
https://doi.org/10.21979/N9/4YDZED |
Study Scope |
|
Keywords: |
Computer and Information Science, Medicine, Health and Life Sciences, Computer and Information Science, Medicine, Health and Life Sciences, encoding scheme, protein classification, antigenicity prediction |
Abstract: |
A method called Context-Free Encoding Scheme (CFreeEnS) was proposed to encode protein sequence pairs into a numeric matrix. CFreeEnS takes advantage of rich information about the physiochemical and structural properties of amino acids. This encoding scheme keeps information about conserved properties of amino acids, which makes it possible for learning methods (e.g. random forest) to capture the cross-subtype antigenic pattern of influenza viruses. Besides, the CFreeEnS, free from dependence on carefully designed features, should be applicable to other applications in bioinformatics measuring the phenotype similarity from sequences. We have tested the method on four more datasets, namely the iAMP-2L dataset classifying antimicrobial peptides from non-antimicrobial peptides [5]; the tumor homing peptides dataset (TumorHPD); the HemoPI including hemolytic, non-hemolytic and semi-hemolytic peptides and the phage virion proteins. The predicting accuracy of 10-fold cross validation is compared with two reported methods. Results show that the CFreeEnS outperforms or at least is competitive with the traditional method using handcrafted features and a state-of-art method named m-NGSG. |
Kind of Data: |
.zip |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Identification Number: |
10.1109/ACCESS.2018.2890096 |
Bibliographic Citation: |
Zhou, X., Yin, R., Zheng, J.,& Kwoh, C. K. (2019). An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification. IEEE Access, 77348-7356. |
Citation |
|
Identification Number: |
10356/105937 |
Bibliographic Citation: |
Zhou, X., Yin, R., Zheng, J., & Kwoh, C.-K. (2019). An encoding scheme capturing generic priors and properties of amino acids improves protein classification. IEEE Access, 7, 7348-7356. |
Label: |
code-data.zip |
Text: |
code for CFreeEnS and datasets for testing. |
Notes: |
application/zip |