How an expert-review could be used to resolve out-of-domains

One of the more challenging outcomes from a (Q)SAR model is the out-of-domain (OOD) result. This result is possible since (Q)SAR models are, in many situations (such as the ICH M7 guideline), required to perform an applicability domain analysis to satisfy OECD validation principles1.  Although a (Q)SAR model may still generate a prediction, the OOD result generally means that the test chemical is outside the chemical space which the model makes a prediction with a given reliability.  An expert-review is helpful to understand and reassess the reliability of an OOD result.

The procedures discussed in the blog entry titled “The use of chemical analogs in expert reviews” could be utilized in such an expert-review; however, understanding the reason why the prediction is OOD in the first place could cue the assessor into what may be the focus of the review. This is especially important because different model types vary in their definition of its applicability domain. A prediction is considered within the applicability domain of Leadscope’s statistical models based on the inclusion of at least one structural feature in the model’s prediction and a sufficiently similar analog in the training set. Meeting these criteria indicates that the model ‘knows’ something about your test chemical and has a basis for making a prediction. If one of these criteria is not met; for example, there is no analog with sufficient similarity to the test chemical in the training set; but there are structural features used in the prediction, the result is considered OOD. In some of these cases, the lack of a close neighbor is due to the inclusion of a sub-structure in the test chemical which is not familiar to the model.  In addition to assessing potentially reactive features, and the relevancy of the model features, the prediction for the core sub-structure which is within the applicability domain of the model could serve as part of an expert review.  Assessing whether this sub-structure is potentially reactive is a good starting point, i.e., are there sufficient negative examples in the database to not consider the sub-structure a concern? If so, this review would support a reassessment of the prediction’s reliability.

Readily available parameters, such as the prediction probability are also helpful in such cases. A previous analysis by Amberg et al., 2019 showed that the risk of missing a mutagenic impurity given an OOD statistical result with a probability <0.2 and a negative expert rule based result is approximately the same for both methodologies predicting negative2.

In another instance, there may be sufficiently similar analogs in the training set, but the statistical model’s prediction is out of domain due to an absence of model features. Here an advantage of using complementary statistical and expert-rule based approaches is observed since the expert-rule based prediction will likely be within the applicability of domain because its applicability is not defined by model features but rather on the sufficiency of analogs. The content of the blog entry on the use of chemical analogs in expert reviews are helpful in these situations. In resolving OOD assessments, an expert-review is used. Despite the conduct of such a review, there remains a computational aspect to the analysis as the model/software provides information that facilitates the review.

Please contact me if you would like to discuss this further:

  1. OECD (2014), Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models, OECD Series on Testing and Assessment, No. 69, OECD Publishing, Paris,
  2. Amberg, Alexander et al. 2019. “Principles and Procedures for Handling Out-of-Domain and Indeterminate Results as Part of ICH M7 Recommended (Q)SAR Analyses.” Regulatory Toxicology and Pharmacology.

The use of chemical analogs in expert reviews

Computational tools offer a rapid, cost-saving advantage to toxicologists assessing the hazard of chemicals. The predictivity of a model for a group of structures is one aspect to be considered in a computational assessment. However, given the universe of chemicals, there are structural classes which a model will predict with a higher level of reliability than others. There may also be chemicals which are not confidently reflected in the model’s chemical space and these chemicals are typically not in the model domain (more on this topic in a later blog). It is important to assess the reliability of the model’s prediction as part of a computational assessment. The expert review is the method that allows the toxicologist to gain confidence in an assessment. I like to think of this process as being analogous to the review of experimental data through controls, statistical analyses, and the determination of false negative or positive results.

The degree to which a user can interact with the computational platform and understand how an assessment was made relates to the level of review which could be performed practically. Access to descriptors that are used in the prediction and the ability to support the use of the descriptors through analysis of the underlying data which substantiate the descriptor use is important. Parameters such as the diversity of the training set examples, the extent that any structural descriptors can be linked to a mechanism or whether there are other structural moieties that explain the activity of the training set examples are used to evaluate the assessment.

In addition to the above, here are a couple of items that I like to evaluate as part of a computational assessment.

  • Are there any potentially reactive features that are not considered by the model? This provides added information to support negative predictions in cases where the structural features of a statistical model does not consider the entire structure. In Figure 1, the entire structure is considered as a feature. This feature maps to two examples which are assessed as negative.
Figure 1. An evaluation of LS-167087 for potentially reactive features

Figure 2 shows features not considered in the analysis of LS-181651. An analysis of potentially reactive features considers the 1,3,5-triazine,2-phenyl- feature. This feature mapped to 4 examples, which are all negative for sensitization hazard. One of these examples (LS-181621) is a close chemical analog. Such reviews support a negative prediction.

Figure 2. An evaluation of LS-181651 for potentially reactive features
  • Is an alerting fragment represented in a known negative example structure and how does the chemical environment of the alerting fragment compare to the target structure? Figure 3 shows an aromatic nitro indeterminate alert which matched the target structure. A search for analogs showed that the indeterminate alert is also present in LS-188180, a known negative.  The system performs a comparison of the target and analog structures and indicates that the alerting sub-structure is within the same chemical environment in both structures. Such a review supports a negative assessment for the target structure, which is also assessed as negative by a statistical model.
Figure 3. Assessment of an analog to support an expert-rule based prediction

Expert reviews give added reliability to an assessment. Transparent models and platforms facilitate such reviews and mitigate any black-box concerns around in silico tool use.

Please send me a note at if you would like to discuss in more detail.

Computational toxicology at the SOT 2021 Meeting

We are happy to be presenting at this year’s virtual SOT meeting1 on a number of topics throughout the course of the event.

On Monday 15th March at 3pm (EST), we are covering some new areas at the session “Hot Topics in Computational Toxicology: New Developments that Support Regulatory Submissions.” This includes a discussion on new developments and publications in the areas of predicting bioactivation to support the assessment of drug-drug interactions as well as the prediction of acute toxicity and skin sensitization. In addition, we will review how N-Nitrosamine structure-activity relationships support carcinogenic potency categories.

On Tuesday March 16th at 08:30 EST during the SOT-CTSS virtual reception, we will be accepting an award for the Society of Toxicology (SOT) Computational Toxicology Specialty Section (CTSS) scientific paper of the year. The title of this paper is “A cross-industry collaboration to assess if acute oral toxicity (Q)SAR models are fit-for-purpose for GHS classification and labelling2.” A poster on this work will also be presented on Tuesday, March 23rd at 1:00pm EST at the Computation Toxicology II poster session.

And throughout the event many of our scientists will be at Instem’s In silico tox solutions booth. We look forward to meeting you there.

Please send me, Glenn Myatt (, a note if you’d like to discuss any of these topics in more detail.


  1. Summary of Instem’s events at the SOT 2021 Meeting
  2. Bercu, J., Masuda‐Herrera, M.J., Trejo-Martin, A., Hasselgren, C., Lord, J., Graham, J., Schmitz, M., Milchak, L., Owens, C., Lal, S.H., Robinson, R.M., Whalley, S., Bellion, Vuorinen, A., Gromek, K., Hawkins, W.A., van de Gevel, I., Vriens, K., Kemper, R., Naven, R., Ferrer, P., Myatt, G.J., 2021. A cross-industry collaboration to assess if acute oral toxicity (Q)SAR models are fit-for-purpose for GHS classification and labelling, Regulatory Toxicology and Pharmacology, Volume 120, March 2021, 104843 

Target Safety Assessment – Guest Author Dr Frances Hall

This week, we are delighted to welcome Dr. Frances Hall, Instem’s Director Scientific Solutions, as a guest contributor to the blog.

Hello! I’m excited to join you on In Silico Insider.  My name is Frances Hall PhD and I have worked for Instem for several years – I started in the GeneTox Team, enabling CROs, Pharma Companies, Universities and Research Institutes to gain value from the software solutions Instem provides for the genetic toxicology market, such as Comet Assay IV1 and Cyto Study Manager2

Then, around five years ago, I moved into the KnowledgeScan Target Safety Assessment Team and I have loved watching the industry grow and seeing our KnowledgeScan service bring real value to R&D organizations around the globe.

For those of you who are unfamiliar with KnowledgeScan, it is a technology enabled platform that can quickly and systematically review and distil millions of data records from a variety of published sources. Instem’s expert life scientists then carefully curate, review and interpret the data and present it to our clients in a comprehensive, consistent, easy to understand report format, allowing them to make evidence-based decisions.

One application of KnowledgeScan is our Target Safety Assessment (TSA) service.  TSAs help organizations to identify and assess unintended adverse consequences of target modulation.  They enable researchers to mitigate against target-related toxicities, or to prioritize targets with lower safety risks across the early drug discovery portfolio. Typically, TSAs are completed early in the drug discovery pathway and are continually updated as knowledge is uncovered.  KnowledgeScan gives clients detailed insight into the potential toxicological risks and challenges associated with modulating their drug targets, enabling them to make faster, better informed decisions.

If you’d like to learn more on this topic, I am delighted to be giving the following presentation at the upcoming virtual Society of Toxicology (SOT) meeting. The presentation, titled “Revolutionizing Target Safety Assessment: Technology Advancements and COVID-19 Target Case Study” will discuss how the KnowledgeScan platform has gathered, distilled, and presented the vast amount of data associated with the biological target: ACE2.   I will discuss the challenges of this data proliferation and how cutting-edge analytics drew universal themes and conclusions from the literature.

Can’t make it to SOT? Email me at and I will gladly get a copy of the presentation to you.

Best wishes,


1. Comet Assay IV – live video measurement system for the Comet Assay

2. Cyto Study Manager – data acquisition and reporting for genetic toxicology assays

Endocrine activity in silico protocol

The in silico toxicology protocol consortium has been working on developing in silico protocols across various endpoints. While there are existing frameworks for identifying endocrine disrupting chemicals, there remains a lack of guidance on the usefulness and limitations of predicting relevant effects and mechanisms using in silico methodologies and how to combine relevant information inclusive of those generated by computational models. The endocrine activity working group has been 1) actively addressing which toxicological mechanisms or effects could be realistically predicted using in silico tools given data availability and the state of science around the endpoint, 2) identifying which in silico methodologies to use and 3) how to combine various lines of evidence from both experimental and computational approaches to derive an endpoint assessment.

A challenging area for the use of in silico methods in the context of predicting the endocrine disruption hazard of a chemical is the requirement identifying  a mode of action linking the mechanistic activity of the substance with in vivo adverse effects. While in vivo experimental data allows this evaluation, it is challenging to develop in silico tools to predict reproductive and developmental effects with the granularity that is required to establish this causal link with high confidence. However, given adequate scientific understanding of the systems involved at the mechanistic level, the advance of experimental systems and building upon work carried out by the ToxCAST and Tox21 initiatives, the prediction of endocrine activity (mechanistic effects) using in silico tools is an achievable goal. Therefore, in silico methods play a role in aiding the interpretation of in vivo effects observed in experimental studies. In addition to discussing data integration, the group will lay out case-studies on how in silico tools could be used to facilitate regulatory decision making and chemical prioritization. Our monthly meetings have become something to look forward to.

If you would like to participate in the endocrine activity working group, please contact

N-nitrosamine SAR Working group

Leadscope, Inc. (an Instem company), in collaboration with Lhasa Limited are leading a working group of pharmaceutical toxicologists and consultants investigating the carcinogenic potency and structure-activity relationships of N-nitrosamines. This is in response to the recent discovery of N-nitrosamines in marketed pharmaceuticals and the regulatory changes that have resulted. The working group is run by Leadscope’s VP of Product Engineering Dr. Kevin Cross and Lhasa Limited Senior Scientist Dr. David Ponting.

Key points in the current regulatory environment, from the point of view of Leadscope customers and Lhasa Limited members, have been driven by the European Medicines Agency (EMA)1 and U.S. FDA2. They are: N-nitrosamine compounds, being potent mutagens and carcinogens, must be controlled to an exposure limit of 18 ng/day, unless there is a close structural analogue with reliable carcinogenicity data. The working group has several topics under consideration as a result:

  • What mechanism(s), in addition to the main alpha-carbon hydroxylation mechanism that leads to potent carcinogenic response, may occur?
  • Which compounds have reliable carcinogenicity data that can be read-across to, and what is the best potency estimate we can derive for those compounds by meta-analysis of all available data?
  • How should the best read-across analog for a novel N-nitrosamine be selected?
  • If there is no single read-across analog with sufficient data, can a combination of analogs give sufficient information?
  • Are there any compounds or substructures where clear toxicity and mechanistic information indicates that the cohort of concern limits may not be applicable?
  • What improvements can we make to the structure-activity relationships for N-nitrosamines?

Ultimately, the working group is writing a series of papers on these topics, so that the expertise developed and shared within the group becomes the foundation for consistent analysis of N-nitrosamines both within the workgroup membership and more broadly. If you have any unpublished carcinogenicity or mutagenicity data for N-nitrosamines that could be useful for the working group, please get in touch with Kevin Cross (




New acute toxicity (Q)SAR manuscript

Late last year we reviewed a collaboration to assess whether acute (Q)SAR models are fit-for-purpose1 to support classification and labeling, since the use of an alternative approach would support the 3Rs.

As part of this exercise, a series of primarily proprietary chemicals with acute toxicity data were run through the different acute (Q)SAR methodologies and the results, both experimental and predicted, were shared with us. This information was then combined from all the companies and performance statistics generated.

The project also took into consideration how an expert review of the information would factor into such assessments and what elements might be considered as part of such a review.

Based on the results from this project, a workflow was proposed that incorporates (Q)SAR assessment.

We are pleased that the open access publication describing this work is now online and can be viewed at:

The paper concludes that such a workflow that includes (Q)SAR models as well as an expert review “… provides a scientifically rational, reasonable and conservative approach to hazard identification”.

We are now working hard on the next generation of these models to support classification and labelling.

If you’d like to discuss this work in more detail, please contact me at



In silico toxicology consortia: impact and future direction

In a previous blog entry, Dr. Glenn Myatt discussed the impetus for developing a consortium to define best in silico practices around toxicological endpoints, such as, genetic toxicity, skin sensitization, carcinogenicity, neurotoxicity and acute toxicity, to name a few. The aim of these protocols is to reduce the burden on industry and regulators to justify their use, as well as ensure in silico assessments are performed in a consistent and reproducible manner to support good in silico practices. The consortia’s activities support a number of emerging or existing regulatory guidelines such as the ICH M7: DNA reactive (mutagenic) impurities in pharmaceuticals.

Group activities result in either protocols (which define implementable rules and principles), position papers (which describe the current state of science and the extent to which in silico tool use is feasible), case studies, structure-activity relationships, or fit-for-purpose evaluations. To date, the reliability scoring paradigm1, which serves as a useful extension of the Klimisch scoring of experimental data has been cited by the World Health Organization, EHC240: Principles and Methods for the Risk Assessment of Chemicals in Food, subchapter 4.5. Genotoxicity2.  Further, elements of the ‘In silico toxicology protocols’1 and ‘Principles and procedures for handling out-of-domain and indeterminate results as part of ICH M7 recommended (Q)SAR analyses’3 have been cited by the European Medicines Agency (EMA)’s reflection paper on the qualification of non-genotoxic impurities4 and the ICH guideline M7 on assessment and control of DNA reactive (mutagenic) impurities in pharmaceuticals to limit potential carcinogenic risk – questions & answers Step 2b5.

Several working groups are in progress, and new working group activities including an expansion of the carcinogenicity position paper6 to develop case studies in support of the development of new approaches and protocols using information from target safety assessments7, in silico approaches and in vitro/in vivo data are being formed. Additional working groups, including the assessment of biomolecule reactivity and drug/drug interaction are on the horizon. If you are interested in any of these topics, or have a comment, question, or problem that you are facing please contact or


  1. Myatt, G.J., Ahlberg, E., Akahori, Y., et al. (2018), In Silico Toxicology Protocols. Regul. Toxicol. Pharmacol. 98, 1-17. doi:10.1016/j.yrtph.2018.04.014. Open  access:
  2. World Health Organization & Food and Agriculture Organization of the United Nations (2020), Principles and methods for the risk assessment of chemicals in food. Subchapter 4.5 Genotoxicity. Environmental health criteria 240
  3. Amberg, A., Andaya, R.V., Anger, L.T., et al. (2019) Principles and procedures for handling out-of-domain and indeterminate results as part of ICH M7 recommended (Q)SAR analyses. Regul. Toxicol. Pharmacol. 102, 53–64. 10.1016/j.yrtph.2018.12.007
  4. European Medicines Agency (2018) Reflection paper on the qualification of non-genotoxic impurities
  5. European Medicines Agency (2020) ICH guideline M7 on assessment and control of DNA reactive (mutagenic) impurities in pharmaceuticals to limit potential carcinogenic risk – questions & answers Step 2b
  6. Tice at al., In Silico Approaches In Carcinogenicity Hazard Assessment: Current Status and Future Needs, submitted to Regulatory Toxicology and Pharmacology

What customer support question do we hear the most?

One of the most common questions we are asked is how an overall toxicity assessment for a given chemical is arrived at, especially when there are conflicting study results available.

To answer this question, it is often helpful to review the process of producing the content in Leadscope’s databases, which is overseen by Leadscope’s Manager of Database Content, Dave Bower.

The starting point for this production process is the original data sources. For the genetic toxicity database, these sources include CCRIS, CDER, CFSAN, CPDB, DSSTox, EPA-Genetox, NTP, publications, donated company information, and many more.

Converting this information into an integrated database initially involves two parallel processes: (1) the chemical structure processing workflow and (2) the content (toxicity study) building process.

To ensure all studies for the same chemical are linked together, each chemical (test article) is compared against our existing database. It is either registered as a new chemical (and given a new Leadscope ID) or it is linked to a previously registered chemical. This process can be difficult when only a chemical name has been reported, particularly when a chemical is historically referred to by different names. In situations when the chemical structure is displayed within the source material, issues related to the depiction of its stereochemistry as well as aromaticity and tautomerism may need to be taken into consideration. Mixtures, salt forms are often linked to SAR-forms of the chemical to ensure the studies are readily accessible and to support computational modelling efforts.

The content building process is also challenging since the underlying information may or may not be an electronic form that is suitable for processing automatically. In certain situations, it is necessary to enter the information by hand. In others, it is possible to develop computational tools to read the content directly into the electronic database. An essential step here is to map the data elements described in the source material onto standardized terms. For example, a species and strain in one study may be reported as “S. typhimurium 100” and in another as “Sal. TA100” yet both need to be mapped onto a standardized species and strain terms (“Salmonella typhimurium” and “TA100”). In generating the content, multiple QA steps are included to ensure the integrity of the information.

Once the chemical structure processing is complete and harmonized study records are linked to these chemicals, a process of grading the chemicals can then take place. For example, an overall call for bacterial mutagenicity can be derived from the multiple data sources. This process involves an examination of the overall study calls and the underlying individual test results. Some factors that are taken into consideration include whether the data source is trusted or authoritative and whether or not the study is compliant with accepted test protocols. Since the overall study calls for different studies may be conflicting, the weight of the evidence needs to be considered into generating an overall grade for an individual chemical. However, the individual studies are always reported alongside any overall calls to support an expert review of the individual calls.

Dave recently put together a slide deck explaining the process in detail, including a series of case studies illustrating the process. Please get in touch with me ( if you would be interested in learning more about this process or would like a copy of Dave’s slide deck.

Are (Q)SAR models fit-for-purpose for classification and labelling?

We were recently involved in a cross-industry project to determine whether (Q)SAR models were fit-for-purpose for classification and labelling. To test this hypothesis, a series of companies across different industrial sectors each compiled a data set of chemicals with experimental acute rat oral data. These chemicals were run against the first version of the Leadscope acute rule-based and statistical-based models. The experimental results along with the predictions generated by the (Q)SAR models were then shared (no information on the individual chemical structures was shared) and the performance of the models based on this blind data set was then quantified.

We calculated a number of statistics to determine whether these models were fit-for-purpose. This included an assessment of whether the (Q)SAR models predicted either the correct category or a more conservative or potent category.

The absolute percentage of correct or more conservative predictions was approximately 95%.

These results are part of a manuscript that was just accepted for publication. The paper also covers the performance of the different (Q)SAR methodologies, the performance over different industrial sectors as well as the impact of an expert review on the results.

Please get in touch with me ( if you would like a copy of a recent poster from the ACT meeting on this topic or would like to talk in more detail.