We use experimentally validated Human Khib peptide data [1,2], To prepare the benchmark data sets with high confidence for training and testing, we procedure established: (1) The Khib sites with high confidence were collected as positive sites, and the remaining lysine residues on the Khib-containing proteins were considered negative sites. (2) In order to avoid over-estimation caused by similar protein sequences, the Khib-containing proteins with sequence identities > 40% were clustered using the CD-HIT tools. (3) For each site in the dataset, we extracted 7-residue peptides with the lysine site in the center. If the Khib-containing peptides (i.e., positive peptides) were identical to those with negative sites (i.e., negative peptides), negative peptides were removed. (4) In order to construct the corresponding negative data set, the negative peptide with the same number of positive peptide was randomly selected. (5)The representative proteins in the dataset was randomly divided into two groups: 4/5 (15,156) for cross-validation and the rest 1/5 (3,790) for an independent test.

[1] Huang, H., et al., Landscape of the regulatory elements for lysine 2-hydroxyisobutyrylation pathway. Cell Res, 2018. 28(1): p. 111-125.
[2] Wu, Q., et al., Global Analysis of Lysine 2-Hydroxyisobutyrylome upon SAHA Treatment and Its Relationship with Acetylation and Crotonylation. J Proteome Res, 2018. 17(9): p. 3176-3183.

In order to compare the similarities and differences of different species, we also collected data of other species for 10-cross validation set and independent test set.

Who are using?