One of the most common questions we are asked is how an overall toxicity assessment for a given chemical is arrived at, especially when there are conflicting study results available.

To answer this question, it is often helpful to review the process of producing the content in Leadscope’s databases, which is overseen by Leadscope’s Manager of Database Content, Dave Bower.

The starting point for this production process is the original data sources. For the genetic toxicity database, these sources include CCRIS, CDER, CFSAN, CPDB, DSSTox, EPA-Genetox, NTP, publications, donated company information, and many more.

Converting this information into an integrated database initially involves two parallel processes: (1) the chemical structure processing workflow and (2) the content (toxicity study) building process.

To ensure all studies for the same chemical are linked together, each chemical (test article) is compared against our existing database. It is either registered as a new chemical (and given a new Leadscope ID) or it is linked to a previously registered chemical. This process can be difficult when only a chemical name has been reported, particularly when a chemical is historically referred to by different names. In situations when the chemical structure is displayed within the source material, issues related to the depiction of its stereochemistry as well as aromaticity and tautomerism may need to be taken into consideration. Mixtures, salt forms are often linked to SAR-forms of the chemical to ensure the studies are readily accessible and to support computational modelling efforts.

The content building process is also challenging since the underlying information may or may not be an electronic form that is suitable for processing automatically. In certain situations, it is necessary to enter the information by hand. In others, it is possible to develop computational tools to read the content directly into the electronic database. An essential step here is to map the data elements described in the source material onto standardized terms. For example, a species and strain in one study may be reported as “S. typhimurium 100” and in another as “Sal. TA100” yet both need to be mapped onto a standardized species and strain terms (“Salmonella typhimurium” and “TA100”). In generating the content, multiple QA steps are included to ensure the integrity of the information.

Once the chemical structure processing is complete and harmonized study records are linked to these chemicals, a process of grading the chemicals can then take place. For example, an overall call for bacterial mutagenicity can be derived from the multiple data sources. This process involves an examination of the overall study calls and the underlying individual test results. Some factors that are taken into consideration include whether the data source is trusted or authoritative and whether or not the study is compliant with accepted test protocols. Since the overall study calls for different studies may be conflicting, the weight of the evidence needs to be considered into generating an overall grade for an individual chemical. However, the individual studies are always reported alongside any overall calls to support an expert review of the individual calls.

Dave recently put together a slide deck explaining the process in detail, including a series of case studies illustrating the process. Please get in touch with me (gmyatt@leadscope.com) if you would be interested in learning more about this process or would like a copy of Dave’s slide deck.

Published by Glenn Myatt

Glenn J. Myatt is the co-founder and currently head of Leadscope (An Instem company) with over 25 years’ experience in computational chemistry/toxicology. He holds a Bachelor of Science degree in Computing from Oxford Brookes University, a Master of Science degree in Artificial Intelligence from Heriot-Watt University and a Ph.D. in Chemoinformatics from the University of Leeds. He has published 27 papers, 6 book chapters and three books.