Reach Us +441414719275


High-Throughput Screening Assay Datasets from the PubChem Database

Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentrationresponse experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as ‘active’, though its meaning is ‘inactive’ on the target of interest as it is ‘active’ on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases.The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking. Keywords: HTS; PubChem; Datasets; LB-CADD


Mariusz Butkiewicz,Yanli Wang, Stephen H Bryant, Edward W Lowe Jr, David Weaver C and Jens Meiler

Abstract | Full-Text | PDF

Share this  Facebook  Twitter  LinkedIn  Google+
Flyer image

Abstracted/Indexed in

  • Chemical Abstracts Service (CAS)
  • Index Copernicus
  • Google Scholar
  • China National Knowledge Infrastructure (CNKI)
  • Directory of Research Journal Indexing (DRJI)
  • WorldCat
  • Geneva Foundation for Medical Education and Research
  • Secret Search Engine Labs