Code and data for CFreeEnS (doi:10.21979/N9/4YDZED)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link) (external link)

Document Description

Citation

Title:

Code and data for CFreeEnS

Identification Number:

doi:10.21979/N9/4YDZED

Distributor:

DR-NTU (Data)

Date of Distribution:

2019-04-03

Version:

1

Bibliographic Citation:

Zhou, Xinrui, 2019, "Code and data for CFreeEnS", https://doi.org/10.21979/N9/4YDZED, DR-NTU (Data), V1

Study Description

Citation

Title:

Code and data for CFreeEnS

Identification Number:

doi:10.21979/N9/4YDZED

Authoring Entity:

Zhou, Xinrui (Nanyang Technological University)

Software used in Production:

python

Distributor:

DR-NTU (Data)

Access Authority:

Zhou Xinrui

Depositor:

Zhou Xinrui

Date of Deposit:

2019-04-03

Holdings Information:

https://doi.org/10.21979/N9/4YDZED

Study Scope

Keywords:

Computer and Information Science, Medicine, Health and Life Sciences, Computer and Information Science, Medicine, Health and Life Sciences, encoding scheme, protein classification, antigenicity prediction

Abstract:

A method called Context-Free Encoding Scheme (CFreeEnS) was proposed to encode protein sequence pairs into a numeric matrix. CFreeEnS takes advantage of rich information about the physiochemical and structural properties of amino acids. This encoding scheme keeps information about conserved properties of amino acids, which makes it possible for learning methods (e.g. random forest) to capture the cross-subtype antigenic pattern of influenza viruses. Besides, the CFreeEnS, free from dependence on carefully designed features, should be applicable to other applications in bioinformatics measuring the phenotype similarity from sequences. We have tested the method on four more datasets, namely the iAMP-2L dataset classifying antimicrobial peptides from non-antimicrobial peptides [5]; the tumor homing peptides dataset (TumorHPD); the HemoPI including hemolytic, non-hemolytic and semi-hemolytic peptides and the phage virion proteins. The predicting accuracy of 10-fold cross validation is compared with two reported methods. Results show that the CFreeEnS outperforms or at least is competitive with the traditional method using handcrafted features and a state-of-art method named m-NGSG.

Kind of Data:

.zip

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Publications

Citation

Identification Number:

10.1109/ACCESS.2018.2890096

Bibliographic Citation:

Zhou, X., Yin, R., Zheng, J.,& Kwoh, C. K. (2019). An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification. IEEE Access, 77348-7356.

Citation

Identification Number:

10356/105937

Bibliographic Citation:

Zhou, X., Yin, R., Zheng, J., & Kwoh, C.-K. (2019). An encoding scheme capturing generic priors and properties of amino acids improves protein classification. IEEE Access, 7, 7348-7356.

Other Study-Related Materials

Label:

code-data.zip

Text:

code for CFreeEnS and datasets for testing.

Notes:

application/zip