We used the experimentally verified Cysteine S-sulphenylation (CSO) peptides data derived from proteomic assays [1-5]. To prepare the benchmark data sets with high confidence for training and testing, we procedure established : (1) The CSO sites with high confidence were collected as positive sites, and the remaining Cysteine residues on the CSO-containing proteins were considered negative sites. (2) In order to avoid over-estimation caused by similar protein sequences, the CSO-containing proteins with sequence identities > 40% were clustered. If the CSO-containing peptides (i.e., positive peptides) were identical to those with negative sites (i.e., negative peptides), negative peptides were removed. If the CSO-containing peptides was repeated, the repeated peptides were removed (3) The representative proteins in the dataset was randomly divided into two groups: 10/11 for cross-validation and the rest 1/11 for independent test.
[1] Akter, S., et al., Chemical proteomics reveals new targets of cysteine sulfinic acid reductase. Nat Chem Biol, 2018. 14(11): p. 995-1004.
[2] Yang, J., et al., Site-specific mapping and quantification of protein S-sulphenylation in cells. Nat Commun, 2014. 5: p. 4776.
[3] Gupta, V., et al., Diverse Redoxome Reactivity Profiles of Carbon Nucleophiles. J Am Chem Soc, 2017. 139(15): p. 5588-5595.
[4] Li, R., et al., Quantitative Protein Sulfenic Acid Analysis Identifies Platelet Releasate-Induced Activation of Integrin beta2 on Monocytes via NADPH Oxidase. J Proteome Res, 2016. 15(12): p. 4221-4233.
[5] Huang, J., et al., Mining for protein S-sulfenylation in Arabidopsis uncovers redox-sensitive sites. Proc Natl Acad Sci U S A, 2019. 116(42): p. 21256-21261.