We used the experimentally verified papaya Kcr peptides data derived from papaya proteomic assays [1]. To prepare the benchmark data sets with high confidence for training and testing, we procedure established : (1) The Kcr sites with high confidence were collected as positive sites, and the remaining lysine residues on the Kcr-containing proteins were considered negative sites. (2) In order to avoid over-estimation caused by similar protein sequences, the Kcr-containing proteins with sequence identities > 30% were clustered using the CD-HIT tools. (3) For each site in the dataset, we extracted 7-residue peptides with the lysine site in the center. If the Kcr-containing peptides (i.e., positive peptides) were identical to those with negative sites (i.e., negative peptides), negative peptides were removed. If the Kcr-containing peptides was repeated, the repeated peptides were removed (4) The representative proteins in the dataset was randomly divided into two groups: 4/5 (1,188) for cross-validation and the rest 1/5 (297) for an independent test.

