Candidate Molecule Selection Based on In Silico Predicted ADMET Properties of 12 Indenoindole Derivatives

Molecule Selection Based on In Silico Predicted ADMET Properties of 12 Indenoindole Derivatives. Derivatives Abstract For considering future in vivo assays, it is necessary to investigate pharmacokinetic and toxicity profile of new chemical entities to select the best candidate(s) for further evaluations. Physicochemical parameters and ADMET (Absorption, Distribution, Metabolism, Elimination and Toxicity) properties of 12 indenoindole derivatives – identified as potent inhibitors of the ABCG2 protein - were predicted in silico with the Molinspiration and the ACD/Percepta softwares. The evaluation of mutagenicity and carcinogenicity was achieved by using the QSAR Toolbox software. Based on the exercise, i) two phenolic derivatives should not be metabolically activated by CYP enzymes according to the QSAR Toolbox software leading to a lower mutagenic risk, ii) compounds 2b, 2c could be excluded from further studies because of clastogenic risks and again compound 2c for a relatively low oral bioavailability, iii) one compound for its blood toxicity and five because for their pulmonary toxicity. Finally, six out of the 12 derivatives (1a, 1b, 2a, 2d, 2e and 2g), were predicted, in terms of ADMET properties, to be good candidates for further in vivo


Background
A major problem of anticancer chemotherapy is multidrug resistance, often related to overexpression of multidrug ATP-binding cassette (ABC) transporters, which reduce the intracellular concentration of drugs below their cytotoxic threshold. Three among the 48 human ABC transporters are recognized to be associated with a low prognostic in cancer patients treated by chemotherapy: ABCB1/Pgp (P-glycoprotein) [1], ABCC1/MRP1 (multidrug resistance protein 1) [2], and the more recently identified ABCG2/BCRP (breast cancer resistance protein) [3][4][5]. The latter is a "half-transporter" of 655 aminoacids, containing one cytosolic nucleotide-binding domain, and one transmembrane domain with six-helical spans, which homodimerizes to be functional. It was found to be abundant in many types of cancer [6].
One of the strategies aimed at eliminating chemoresistant cancer cells is to use inhibitors of the multidrug ABC transporters in order to sensitize tumor growth to anticancer drug cytotoxicity. The first specific ABCG2 inhibitor, of natural origin, was fumitremorgin C (FTC) which unfortunately displayed serious neurotoxicity [7]. Synthetic derivatives were developed, resulting in a highly potent inhibitor, Ko143, which however still retained some residual toxicity [8]. Screening of different classes of flavonoids identified interesting inhibitors such as hydrophobic flavones and acridones [9,10], chromones [11], and different types of chalcones [12], some of the inhibitors being active in vivo in mice models [13,14].
Novel ABCG2-selective inhibitors were recently developed as a series of ketonic indenoindoles after appropriate substitutions such as the replacement of isopropyl by phenethyl at N 5 position of C-ring and the addition of hydrophobic substituents on D-ring [15]. The phenolic derivatives were found to inhibit ABCG2 even more potently and selectively, and they markedly stimulated ATPase activity, by contrast with ketonic derivatives, or with Ko 143 and chromones that strongly inhibited ATPase activity [16]. In view of future use in animal models, the ADMET parameters of the most potent inhibitors were analyzed, to select those having the best profile.
Since 2007, in order to minimize animal testings, REACH regulation (regulation for Registration, Evaluation, Authorisation and restriction of CHemicals) promotes a "reduction, refinement or replacement" of animal use (3R) with alternative methods (annexe VII to XI) to evaluate toxicity of chemicals [17]. The principle of the "Three Rs" has been present early (1986) in the EU, since 1991 the ECVAM (European Centre for the Validation of Alternative Methods) is implicated in the scientific validation, the promotion of development and dissemination of new methods. The European Medicine Agency's (EMA), in charge since 1995 of the scientific evaluation, supervision and safety monitoring of medicines in the EU, promotes the regulatory acceptance of 3R by the replacement of animal studies with in vitro models. On January 1, 2013 the EU Directive 2010/63/EU was followed in July 2013 by the creation of NETVAL (European Union Network of Laboratories for the Validation of Alternative Methods) to support the EURL ECVAM actions.
Alternative methods include in silico studies [17], with the advantage to be less expensive and time-consuming. These methods give a high throughput, could be optimized and require a lower synthesis of compounds. Therefore, poor pharmacokinetic profiles (ADME) can be detected early, avoiding costly late-stage failure in drug development.
The ECHA (European CHemical Agency) Guidance on information requirements and chemical safety assessment identified three main approaches for getting in silico non-testing data: (1) grouping approaches, including read-across and chemical category formation; (2) structure-activity relationship (SAR) and quantitative SAR, and (3) expert systems [18].
In our study, three in silico prediction tools were used to determine physicochemical and ADMET properties of a set of twelve indenoindoles, six ketonic and six phenolic derivatives (Figure 1) in order to select the best one(s) for future in vivo tests.

Physicochemical properties
Altogether 12 physicochemical parameters such as cLogP, molecular weight, TPSA (topological polar surface area) value, the number of hydrogen acceptor and donor and rotatable bonds have been calculated and are predictable by the Molinspiration software (v2013.09) [19]. cLogP parameter is defined as the sum of fragment-based contributions and correction factors. TPSA calculation is based on the summation of tabulated surface contributions of polar fragments [20]. These fragment contributions were determined by least-squares fitting to the single conformer 3D PSA for 34,810 drugs from the World Drug Index (correlation coefficient ≈ 0.99) [20].

ADME parameters
The ADME parameters (n=12) have been calculated by the ACD/Percepta software (v.14.0.0) [21]. The model used for the bioavailability module is based on differential equations describing solubility in the gastrointestinal tract, passive absorption in jejunum and elimination. The first pass effect is not considered in simulation. The database has 790 compounds compiled from reference pharmacokinetic tabulations and various articles. The prediction studies for (i) protein binding module, and (ii) P450 substrate and regioselectivity module (metabolism), are based on GALAS (Global, Adjusted Locally According to Similarity) modeling methodology [22] which consists of a global Partial Least Squares (PLS) QSAR model corrected by predicted values of similar compounds to the tested compound 1 or 2 ( Figure  1) that are in the training set (70% of the whole data set of the software). These results are completed by a reliability index (RI) ranging from 0 to 1, where 0 is unreliable for the prediction and 1 is a fully reliable prediction. This RI value tends to zero in two cases: i) when the overall similarity between the tested compound 1 or 2 and the most similar compounds used for the correction of the global model is weak, because the predicted values of each of these compounds (obtained by bootstrapping the training set 100 times) weakly correlate with those of 1 or 2, ii) when an inconsistent variability between the predicted and the experimental values was observed among the most similar compounds to the tested compound 1 or 2. The validation sets were used to evaluate result accuracy by computing the Root Mean Square Error (RMSE) between the studied parameter

Chemical informatics ISSN 2470-6973
experimental value and its final predicted value. The results with a reliability index under 0.3 (cut-off value) should be discarded without any consideration to the result.
Predictive parameters for blood brain barrier permeability: The predictive models of Log PS and Log BB constants were built by using non-linear least squares regression validated by an internal validation set, and two other experimental external validation sets [23,24]. Physicochemical properties like lipophilicity (Log P), the number of hydrogen bond acceptors and donors, the ion form fraction at pH 6.5, and McGowan volume were calculated with the ACD/Labs Algorithm Builder 1.8 development platform, and used for the modeling. The combination of brain/plasma equilibration rate (Log (PS * fraction unbound, brain)) with the partitioning of compounds at equilibrium (Log BB) classifies compounds as either active or inactive on the central nervous system; the model is validated using experimental data of more than 1500 compounds having a CNS activity [25].

Toxicity Parameters
The toxicity parameters (n=12) have been calculated by the ACD/ Percepta [21] and the QSAR Toolbox [26] softwares.

ACD/Percepta software (v.14.0.0)
This software uses a human expert rules system [27][28][29] to predict mutagenicity, clastogenicity, carcinogenicity and reproductive toxicity. The acute toxicity module for the prediction of intravenous and oral LD50 uses the Galas modelling methodology [22]. The model predicting "health effects on particular organ or organ system" is based on RTECS and ESIS databases for more than 100,000 compounds. The skin irritation model was constructed by using binomial PLS method from a set of pre-selected fragments and physicochemical parameters as descriptors, some specific variables are used to discriminate ionizable compounds and their salts. The experimental data have been collected from RTECS and ESIS databases, and diverse publications.

Physicochemical properties
Calculations showed that all the twelve compounds displayed a topological polar surface area (TPSA) lower than 60 Å 2 , and less than ten rotatable bonds, indicating that they could be good candidates upon oral administration (Table 1) [30]. All selected molecules were also predicted to exhibit a moderate brain uptake on the basis of TPSA values under 60A 2 [31], and might then be active in the central nervous system.
The physicochemical parameters of Lipinski's rule of five indicated that a good absorption or permeation is more likely when molecules have a calculated LogP (cLogP) below 5, a molecular weight lower than 500, less than 10 H-bond acceptors and 5 H-bond donors [32]. All evaluated compounds indeed followed that rule (Table 1), except for the cLogP parameter, where the values were higher for one ketonic compound, 1c, and for the six phenolic compounds 2a-e and 2g. In contrast, compounds 1c, 2c presented a much higher cLogP value (6.18 and 6.90, respectively) showing a less favorable permeation profile. Nevertheless, the compounds having cLogP values up to 5.6 might have drug-likeness properties [33].

ADME parameters
Absorption: The results of oral bioavailability (Table 2) were in the range of 39-78%, which could be correlated to a good absorption rate after passive transcellular transport [34], except for compound 2c (with a predicted low value of 10%).  [35].

Metabolism:
The compounds had a high probability (>0.90), with a good reliability (RI from 0.31 to 0.41) for the six phenolic derivatives 2a-2e, 2g and the two ketonic ones 1a, 1b of being metabolized by the CYP3A4 enzyme ( Table 2). Such a modification could mainly consist of specific hydroxylation at para position of the phenethyl group for compounds 2b-2d and 2g.
Elimination: Ketonic and phenolic compounds had a similar elimination-rate constant (ke), from 0.00082 to 0.0011; in contrast, compound 2c displayed a 2-fold lower ke value, indicating a higher retention time inside the body ( Table 2).

Toxicity parameters
Mutagenicity: The knowledge-based expert system highlighted the presence of a planar polycyclic system [29] among all compounds (ketonic and phenolic derivatives) predicting that they all could be DNA-intercalating agents; moreover, some of unsubstituted aromatic regions may be DNA reactive after epoxidation by P450 cytochrome enzymes as mentioned in ACD percepta by "positive" in Table 3.
According to the QSAR Toolbox software, the mutagenicity risks (Ames test, without metabolic activation) were negative for all derivatives ( Table 3). The software can also detect, with a SAR approach, alerts containing electrophilic centers or metabolically activated, attributed to one of six mechanistic domains related to mutagenicity or carcinogenicity [36]. A risk of a specific metabolism by P450 cytochrome enzymes for the ketonic derivatives was highlighted, leading to reactive iminium species and implicating the formation of DNA adducts via a SN1 mechanism [37], but this metabolic pathway was not confirmed by the ACD/Percepta software. On the other hand, an epoxidation mediated by P450 cytochrome enzymes on phenethyl benzenic group was predicted for the three ketonic compounds 1a-1c and the four phenolic ones 2a-2c and 2g. The compounds could then   a ligand displaces half of specifically bound 17β-estradiol from ER-alpha. f,g,h Probability of compounds having chronic, sub-chronic or acute toxic effects on liver, lung and blood, respectively.

Table 3
Toxicity parameters of the 12 indenoindoles predicted by the ACD/Percepta and QSAR Toolbox softwares.

Chemical informatics ISSN 2470-6973
be converted to reactive quinones, and a nucleophilic attack by the nitrogen of DNA could be effective through a Michael addition mechanism, leading to DNA adducts [38]. The compounds 1c and 2c could have a higher probability of being metabolized with an additional phenyl group at position C7. However, derivatives with a methoxy substituent on the phenethyl group (compounds 2d, 2e) were not predicted to be metabolized, maybe due to steric hindrance on specific sites, excluding them from any potential mutagenic risk.
The DNA-intercalating risk highlighted by softwares for our compounds is a feature of heteropolycyclic aromatic hydrocarbons because of their planar polycyclic system that could make stacking interaction with nucleotides, leading to mutagenicity. The crystallographic structure of phenolic indenoindole derivatives showed that their structure was nearly planar, with a mean out-of-plane deviation of 0.027 Å, determined by the derivation of the C12 and C13 atoms from the plane defined by the heterotetracyclic system [16], contrary to the ketonic derivatives with values of 0.1935 Å and 0.4429 Å, respectively [15]. In the literature, some indenoindole analogues of phenolic sub-series were found to be DNA-intercalating agents [39]. Nevertheless, an aromatic polycyclic structure alone could be insufficient to get that property [40], and this capacity seems to be mostly dependent on the type of substitutions. For some compounds, the presence of a cationic amino-alkylated structural alert might enhance the affinity of compounds for the negativelycharged phosphodiester backbone of DNA, and then facilitate their intercalating property by increasing DNA affinity through electrostatic interactions [41]. The addition of a N-dialkyl group and its absence for triphenylethylene derivatives implicates a lack of DNA intercalation [42], and the incorporation of an extended alkylamine group is sometimes used to functionalize anti-cancer therapeutic intercalating agents (e.g., mitoxantrone), or inhibitors of transcription factors [43].
In our case, we noticed that the twelve compounds have a pyrrole moiety and a predicted pKa value, by the ACD/Percepta software, around 9 (data not shown). This indicates that molecules could be positively charged at physiological pH. However, the conjugated system of the pyrrole group makes its protonation difficult, and implicates that compounds may have lower intercalating properties.
Clastogenicity: No conclusions could be drawn for in vivo clastogenicity risks with the ACD/Percepta software. For all compounds, the QSAR Toolbox software detected a structural alert representing two atoms connected to hydrogen-bond acceptors. Therefore, the derivatives could interact with DNA via non-covalent binding, acting as DNA groove binders [44] with the formation of a drug/DNA complex, leading to higher risks of replicative errors.
Another characteristic conducting to clastogenicity risks is the capacity of molecules to interact with proteins like topoisomerase II during DNA replication. This property may occur when compounds are able to form a stable complex between DNA and topoisomerase II, leading to DNA strand breaks without religation, acting as topoisomerase II poisons. Many intercalating drugs like anthracyclines (e.g., doxorubicin) [45], or non-intercalating compounds such as epipodophyllotoxin etoposide [46] can form topo II-DNA covalent complex conferring clastogenic properties.
QSAR Toolbox highlighted a risk for the two compounds 2b, 2c to interact with topoisomerase II (Table 3). This process may implicate a perhydroxylation reaction, resulting from the attack of molecular oxygen in ortho-or para-phenolic position, and the formation of a free peroxy radical stabilized by the methyl (2b) or phenyl (2c) substituent and then converted into the corresponding perhydroxylated derivative. This step is followed by the formation of the ortho-and para-benzoquinones formed as a result of an intramolecular dehydration. Alkylating cellular proteins such as sulfur nucleophiles and other nucleophilic sites in proteins could then attack the quinoid structures via a Michaeltype addition reaction leading to the possible clastogenic effects.

Carcinogenicity:
No conclusion was possible for carcinogenicity risks on mice and other rodents with the ACD/Percepta software, while QSAR Toolbox detected no structural alerts for that risk (Table 3).

Other toxicity parameters:
The twelve compounds were predicted to have moderate-to-weak binding to the ER-alpha (estrogen receptor) [47] with log RBA values between -3 and 0, suggesting a low toxicity on reproduction ( Table 3). The interaction between indenoindole derivatives and the estrogen receptor could be related to the presence of (i) the phenolic ring and (ii) additional ring structures [48], and then explain the relative toxicity.
No significant toxicity was found on the liver or the blood system (except for 1c). A potential damage on lungs could be predicted for compounds 1c-f and 2c, the probability being lower or equal to 0.50 for 1a, 1b, 2a, 2b, 2d, 2e and 2g. The twelve compounds have LD 50 values from 480 to 1100 mg/kg, classifying them in the category 4 of acute toxicity and implicating a slight toxicity for all of them. No skin toxicity was observed for any compound (Table 3).

Conclusion
For the ACD/Percepta software, the developers defined three threshold RI values which subdivided the reliability of predictions into four categories: unreliable (lower than 0.3), borderline (0.3), moderate (0.5) and reliable (0.75). In the case of QSAR LD 50 model for RI values higher than 0.5 or 0.75, the root mean square error (RMSE) fell below 0.5 log units, which is considered to be within the inter-laboratories measurement error. According to the software, falling in this range could give a good estimation of acute toxicity results for compounds, and remained close to in vitro testing in the early stages of drug development.
With our set of compounds, no predictions had a reliability index over 0.75, and only some of them related to LD 50 values were over 0.5. A low RI value implicates a lack of compounds similarity in the training set or an inconstancy of baseline QSAR model predictions with experimental data of five structurally-related analogues with the test compound. This highlights the limitations of QSAR models based on a structurally-predefined training set, where a change in property or mechanism of action of analogues

Chemical informatics ISSN 2470-6973
might be sometimes linked to slight structural changes. To address such a problem, QSAR models of the ACD/Percepta software can be trained by including new experimental data for similar compounds to expand the chemical space and the applicability domain [49], without any statistical reparameterization of the model.
Expert system results need to be analyzed carefully and, as previously seen about DNA intercalation risks, it is necessary to have many structural alerts and to consider the results as a weight of evidence to be able to conclude to a significant effect.
With the present studies, we could exclude compound 2c for relatively low oral bioavailability, 1c for blood toxicity and 1c-f and 2c for pulmonary toxicity. The two phenolic derivatives 2d, 2e should not be metabolically activated by CYP enzymes according to the QSAR Toolbox software leading to a lower mutagenic risk.
Compounds 2b, 2c might be clastogenic on the basis of a hydrogen bond acceptor structural alert with a possible interaction with topoisomerase II, but no conclusion could be given about carcinogenicity risks.
In conclusion, compounds 1a, 1b, 2a, 2d, 2e, 2g may have an acceptable pharmacokinetic and toxicity profile; nevertheless, some preclinical in vitro experiments could help us to confirm mutagenicity and clastogenicity risks (Ames test with metabolic activation, chromosome aberration tests), or evaluation of organ toxicity with in vitro specific cytotoxicity tests.