Because of there are currently no complete non-histone crotonylation datasets, we used the experimentally verified human Kcrot peptides data derived from two proteomic assays [1, 2], and collected dataset from papaya[3], rice[4] and tabacum[5]. To prepare the benchmark data sets with high confidence for training and testing, we procedure established : (1) The Kcrot sites with high confidence were collected as positive sites, and the remaining lysine residues on the Kcrot-containing proteins were considered negative sites. (2) In order to avoid over-estimation caused by similar protein sequences, the Kcrot-containing proteins with sequence identities > 30% were clustered using the CD-HIT tools. (3) For each site in the dataset, we extracted 7-residue peptides with the lysine site in the center. If the Kcrot-containing peptides (i.e., positive peptides) were identical to those with negative sites (i.e., negative peptides), both peptides were removed. (4) The representative proteins in the dataset was randomly divided into two groups: 4/5 (1651) for cross-validation and the rest 1/5 (413) for an independent test. The Kcrot sites on histone proteins are derived from the Qiu's research [6]. Both of the datasets could be downloaded here.

