Alexander Kos and Hans-J?rgen Himmler
Alexander Kos* and Hans-Jürgen Himmler
AKos Consulting and Solutions Deutschland GmbH (AKos GmbH), Austr. 26, D-79585 Steinen, Germany
Received date: July 27, 2015; Accepted date: August 26, 2015; Published date: September 02, 2015
Citation: Kos A, Himmler HJ. Efficient Internet Searches for Chemists. Chem Inform. 2015, 1:2.
iScienceSearch is a free Internet application that allows the user to search by structure, synonyms, CAS Registry Numbers and free text over 100 databases on the Internet. Google is one of these databases. For chemical structure related questions iScienceSearch is a better choice than the Google front-end. Depending on the question sometimes a search started in databases like PubChem or SciFinder is more suitable, sometimes searching the Internet with iScienceSearch gives better results. Besides searching the Internet, iScienceSearch offers tools, like a direct link to predict biological activities and toxicities. The application can be started using the URL http://isciencesearch.com/iss
Internet search engine; Meta-search engine; Rich internet application; RIA; iScienceSearch; Chemical structure search.
Most people go to Google, if they want to know more about a subject [1,2]. Most chemists use PubChem, or SciFinder, if they want to know more about a compound. Both of these are databases and not Internet search engines. Is there no Internet search engine for chemists? There is iScienceSearch [3].
Why would you use iScienceSearch and not Google?
With iScienceSearch, you can search the Internet by chemical structure. Sometime, if you search for a specific chemical name in Google, you get no relevant answer at all. iScienceSearch extends your query and searches not only by the specific chemical name. If you start with a name, iScienceSearch will find the CAS Registry Number [4], provided it is in the public domain, the structure, and more names. Sometimes iScienceSearch does more than 100 different searches in the background. For instance, search for toxicity by structure and you will get a link to a database, which only can be searched by CAS numbers. Search in Google for plants that contain maslinic acid and you will never find the Wikipedia page [5] for clove, because it mentions only crategolic acid, since crategolic acid is a synonym for maslinic acid. Get only relevant answers. You can restrict the search to profiles. Search in “Supplier” if you want to buy a compound. Search in “Open access” journals if want to make sure that you do not only get an abstract. In Google you look at the first page, maybe you look at the second page. This means sometimes you miss the most relevant answer. One of the largest collections of screening compounds is AKos Samples [6]. If you search “buy research screening compounds” you will need to go to page 3 in Google to find the link for AKos Samples. The result page in iScienceSearch gives a different view. iScienceSearch groups according to sources. For instance in a search for “Origins of life” PubMed [7] obviously provides you a scientific and not a philosophical text. In Wikipedia, you can expect both. iScienceSearch gives you the most current view of the Internet. There is always a time delay between publication and when the data is recorded in a database. A structure published in PubChem [8] will appear in about 14 days in Google, a structure published in AKosSamples will appear about 4 weeks later in CHEMCATS [9], and these are the short delays.
Google is a database [10], and as such a source in iScienceSearch. If you know how to transform a chemical structure drawing into InChI name or key [11] you could also search Google by structure. iScienceSearch does this automatically for you. However, you definitely cannot do a substructure search in Google. How often do chemists miss a structure because they start with the enol form and in the publication or database is only the keto form?
Google is a database [10], and as such a source in iScienceSearch. If you know how to transform a chemical structure drawing into InChI name or key [11] you could also search Google by structure. iScienceSearch does this automatically for you. However, you definitely cannot do a substructure search in Google. How often do chemists miss a structure because they start with the enol form and in the publication or database is only the keto form?
Google cannot index an Oracle [12], MySQL [13] database etc. If the data are not in an html file, or are server side generated asp/php etc. pages the data will not appear in the Google index [14,15]. For instance AKosSamples is a MySQL database and you need a special interface to search the database. This problem does not exist in a federated search if access to the database is provided. For examples as it is for AKosSamples in iScienceSearch.
The heading to this paragraph was “Why would you use iScienceSearch and not Google”. For chemical questions indeed it makes more sense to use iScienceSearch instead of Google. In the following we compare iScienceSearch to databases. Here it depends on your question if you start with a database or iScienceSearch, or use both. For some searches a database is the better choice. You can use Boolean logic in your searches and restrict your searches to certain fields in the database.
Why would you use iScienceSearch and PubChem?
PubChem is a database and there are time delays, see below. No system can be comprehensive. Building a database with all suppliers is just too expensive. For instance PubChem has 155 suppliers, CHEMCATS has ca. 880 [9], eMolecules [16] ca. 140, ChemSpider [17] has in total 493 sources; ChemExper [18] lists more than 1500 suppliers. Experience has shown that iScienceSearch is the system of choice if you are searching for suppliers of research chemicals, because with the exception of CHEMCATS all these and 26 more directories of suppliers can be searched in iScienceSearch in one go.
Why would you use iScienceSearch and SciFinder?
The foremost reason to use iScienceSearch is cost. iScienceSearch is free. With the exception of CHEMCATS, the basis of the Chemical Abstract database are journals, patents, dissertations and other high quality sources [19], but not other databases like ChEMBL [20] which collect also high quality data. It should be obvious that not everything is in SciFinder. A few examples are at the end of the paper. There is one more reason why a chemist should also search in iScienceSearch, it is the “Extended Search”.
Extended search
We chemist have solved the issue of similarity by using substructure, and similarity searches with chemical structures. It is extremely limiting that many databases on the Internet cannot be searched by structure. In iScienceSearch we implemented the extended search. This means when you draw a structure, or type a chemical name, iScienceSearch searches in the background databases and finds concordances of structure, identification numbers (i.e., CAS Registry Number or AKos Number), and names. For Aspirin you will find about 200 different names, and it would be too time consuming to do 200 extra searches in the background. iScienceSearch limits the names to about 20 most important ones. In the background iScienceSearch searches for instance by different InChIs, CAS Registry Number and names.
The result is that you start with a structure and get answers from a database that can only be searched by CAS Registry Number (see list of databases in the Table 1 for examples), or you start with a name like maslinic acid and get perfectly correct results where only the synonym crategolic acid appears.
Feature/Tool | Purpose | Explanation |
---|---|---|
Name to structure | You do not have to draw a structure! | You can generate a structure by giving a name (IUPAC, synonym), CAS #, AKosNumber, InChI, etc. |
Compare structures | What is the right structure? | With a structure on the screen or a name, CAS #, etc. in the text box you will get a grid comparing the structures as they look in different databases. This is very useful to check your structures before you publish, i.e., “Tracleer” and “Bosentan”, see below. |
Compare activities | What is the major activity of a compound? | With a structure on the screen or a name, CAS #, etc. in the text box you will get a grid comparing the activities as reported in different databases. |
Predict chemical properties | What is the correct melting point? | With a structure on the screen or a name, CAS #, etc. in the text box you will get a grid with GUSAR [21] calculated physical properties and the links to calculated properties by ACD Labs [22] and ChemAxon[23]. |
Predict biological activities | Which are the possible biological effects of a compound? | With a structure on the screen you will get a reliable prediction of effects, like toxicities, biological activities, etc.[24]. |
Chemicalize | What is the IUPAC name or the logP, etc. | With a structure on the screen you will get a lot of calculated data[25]. |
Table 1: Special features and tools in iScienceSearch.
Profiles
A profile is a selection of databases that are relevant for specific searches. If you want to buy a compound, you can choose to search only over databases that provide supplier information. In a federated search over the Internet it is yet impossible to use a logical “and”. If the original source can interpret a query like “pyridine and carcinogenic”, you will get only answers where pyridine is connected with carcinogenic. However, you cannot draw a structure and type carcinogenic and expect to find only such structures that are carcinogenic. This would mean that the system needs to collect all answers from the Internet, builds a local cache (database) and filters the search. A profile helps to overcome this limitation. If you want to find LD50 ties, you search in the profile “Toxicity” and search only over databases that hopefully offer a LD50. However, an LD50 can always be mentioned in a journal article. Then you should extend your search over the profile “Literature”. Another strategy is to begin searching over “All Sources” and use the sort, group, and filter methods in the result page.
Additional features
Some of the iScienceSearch tools fall in the category of predicting data. iScienceSearch shows links to experimental data where possible. Some features are convenient, like generating structure from text. Other tools are there to compare results of the different databases, to discover error and discrepancies.
Example: Search for toxicity: Suppose we want to learn more about adverse effects of the structure shown in Figure 1. For a comparison, we make a search in SciFinder and iScienceSearch. Neither SciFinder nor iScienceSearch find something under toxicity (or adverse effects). In SciFinder, we look for biological studies and find 23 references. In iScienceSearch, we use the profile “Drug Info” and get an overview as to which database contains information about this compound (Figure 2).
PubChem, ChEMBL, [21-25] DrugBank, [26] etc. have very detailed data, and very often a good overview of the results. Nobody questions the usefulness of SciFinder as a literature search tool, but you do not get an overview as to which database will provide detailed information. In PubChem, you get a nice overview of articles, and widgets display the results in iScienceSearch, see BioActivity window in Figures 3 and 4. You can select those references first, where the compound is found to be active. In ChEMBL you get pie charts that help you getting a fast overview of the activities of a compound.
Comparison of iScienceSearch with other databases
No database is as up-to-date as a snapshot of the current status of the Internet. This means you will not find certain compounds. Try to find in SciFinder the following structures. We made the search on August 6, 2013, and July 23, 2015.
Go to https://www.ncbi.nlm.nih.gov/pcsubstance?cmd=search &term=all%5Bfilt, and try to find the latest compounds that are recorded in PubChem, and you will not find a link to it in Google. Even Google takes time to update its index.
There are currently (July 24, 2015) 68’417’108 compounds in the PubChem (Compounds) Database. One can get this count using the url https://www.ncbi.nlm.nih.gov/pccompound?term=all%5Bfilt%5D. PubChem is one of the depositor to the ChemSpider database. According to https:// www.chemspider.com/DataSources.aspx there are currently 10’882’600 reference links to PubChem compounds in the ChemSpider Database. This means only 16% of all current PubChem compounds are referenced in the ChemSpider Database as of today.
The current number of structure contained in the ChemSpider database mentioned on the ChemSpider homepage (https://www. chemspider.com/) is 34 Million. ChemSpider is a depositor to the PubChem database. According to https://pubchem.ncbi.nlm.nih. gov/sources/sources.cgi the number of references to compounds in the ChemSpider database is 14’642’781. This means one can only find links to 43% of the ChemSpider compounds as of today.
Executing an ‘Identical structure’ search in PubChem using the structure in Figure 5, one only finds a hit for the keto form [27]. Using the same query structure and searching the Drugbank database you find a hit that reference the enol form [28] in PubChem. One more reason to use iScienceSearch where you find all the links. iScienceSearch only includes free databases. For the ETH (Eidgenössische Technische Hochschule, Zurich) we have built a “hop-in” button for the licensed REAXYS system [29] in order to include also such databases. This means if you have a structure on display you can search in REAXYS without redrawing or copying the structure.
Biologists do not use SciFinder. They do not have such a database which collects all abstracts. Biologist is used searching in different databases. iScienceSearch enables in one search to find answers in many databases that are interesting for biologists, see list of databases in the Table 2. Sequence searches are a different story, and you do not do this in iScienceSearch. Scifinder and REAXYS are good if you can start with a chemical structure. They are weak if you start with synonyms. For instance, you will not find the record in REAXYS starting with “Tracleer”, but only when you use the less common synonym “Bosentan”. Also, you do not get the exactly same structure that is in PubChem.
No | Database or Organisation | Search options | URL | ||||
---|---|---|---|---|---|---|---|
Text | Full Struc-ture | SSS | CAS # | Other indenti-fiers | |||
1 | Abblis | x | www.abblis.com/ | ||||
2 | ACS Publication | x | pubs.acs.org/ | ||||
3 | Advanced Technology & Industrial Co., Ltd. | x | x | https://www.advtechind.com/ | |||
4 | AKosSample | x | www.akosgmbh.de/AKosSamples | ||||
5 | Alfa Aesar | x | https://www.alfa.com | ||||
6 | AmadisChemicalis | x | www.amadischem.com/ | ||||
7 | Angene Chemical | x | www.angenechemical.com/aboutus.html | ||||
8 | Apexmol | www.apexmol.com/ | |||||
9 | Aurum Chemicals | x | www.aurumchemicals.pl/ | ||||
10 | BASE | x | x | x | www.base-search.net | ||
11 | Biological Magnetic Resonance Data Bank (BMRB) | x | x | x | www.bmrb.wisc.edu/search/ | ||
12 | Binding Database | x | x | www.bindingdb.org/bind/index.jsp | |||
13 | BioMed Central | x | www.biomedcentral.com | ||||
14 | BroadPharm | x | www.broadpharm.com/ | ||||
15 | BuyersGuideChem | x | www.buyersguidechem.de | ||||
16 | Capot Chemical | x | www.capotchem.com/index_en.htm | ||||
17 | CDC | x | www.cdc.gov/ | ||||
18 | ChemAxonChem search | x | x | https://www.chemicalize.org/ | |||
19 | Chemical Entities of Biological Interest (ChEBI) | x | x | x | x | x | www.ebi.ac.uk/chebi |
20 | CHEMBANK | x | chembank.broadinstitute.org | ||||
21 | ChEMBL | x | https://www.ebi.ac.uk/chembldb | ||||
22 | ChemBridge | x | www.chembridge.com/index.php | ||||
23 | ChemExper Chemical Directory | x | x | x | www.chemexper.com | ||
24 | Chemical Book | x | x | www.chemicalbook.com | |||
25 | The Chemical Database | x | x | https://ull.chemistry.uakron.edu/erd/ | |||
26 | Chemicalland21.com | x | chemicalland21.com | ||||
27 | ChemIDplus | x | x | chem.sis.nlm.nih.gov/chemidplus/ | |||
28 | ChemMol | chemmol.com/ | |||||
29 | ChemSpider | x | x | x | x | www.chemspider.com/ | |
30 | ChemSynthesis | x | x | www.chemsynthesis.com/ | |||
31 | ClinicalTrials | x | clinicaltrials.gov/ | ||||
32 | ChEBICiteXplore | x | www.ebi.ac.uk/citexplore | ||||
33 | CTD | x | x | x | ctd.mdibl.org/ | ||
33 | Chemical Strucutre Lookup Service | x | x | cactus.nci.nih.gov/cgi-bin/lookup/search | |||
34 | Crystallography Open Database (COD) | x | https://www.crystallography.net/index.php | ||||
35 | Developmental and Reproductive Toxicology Database (DART) | x | x | toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?DARTETIC | |||
36 | Directory of Open Access Journals (DOAJ) | x | www.doaj.org | ||||
37 | DrugBank | x | x | x | x | www.drugbank.ca/ | |
38 | DSSTOX | x | x | x | www.epa.gov/ncct/dsstox/ | ||
39 | EBI Search engine | x | www.ebi.ac.uk/ebisearch | ||||
40 | eChemPortal | x | webnet3.oecd.org/eChemPortal/Home.aspx | ||||
41 | Envirofacts | x | x | www.epa.gov/envirofw/gov/envirofw/ | |||
42 | eMolecules | x | x | x | x | www.emolecules.com/ | |
43 | Enamine Ltd. | x | www.enamine.net/ | ||||
44 | eSamples | x | x | x | x | https://www.e-samples.de | |
45 | ESPACENet | x | www.epo.org | ||||
46 | euSDB | x | www.eusdb.de/en | ||||
47 | Exclusive Chemistry Ltd | x | www.exchemistry.com/ | ||||
48 | FDA | x | www.fda.gov | ||||
49 | Fisher Scientific | x | https://www.fishersci.com/ | ||||
50 | Free patents online | x | x | www.freepatentsonline.com/ | |||
51 | GENE-TOX (Genetic Toxicology Data Bank) | x | x | toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX | |||
52 | x | x | www.google.com | ||||
53 | Google Books | x | books.google.com/ | ||||
54 | Google Patent Search | x | www.google.com/patents | ||||
55 | Google Scholar | x | scholar.google.de/ | ||||
56 | Catalogue for libraries of Heidelberg University (HEIDI) | x | x | katalog.ub.uni-heidelberg.de | |||
57 | Human Metabolome Database (HMDB) | x | www.hmdb.ca | ||||
58 | Ibridge | x | www.ibridgenetwork.org/ | ||||
59 | IPCS INCHEM | x | www.inchem.org/ | ||||
60 | IRIS (Integrated Risk Information System) | x | x | toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?IRIS | |||
61 | IS Chemical Technology | x | www.ispharm.com/ | ||||
62 | ITER (International Toxicity Estimates for Risk) | x | toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?iter | ||||
63 | KEGG COMPOUND | x | www.genome.jp/kegg/compound/ | ||||
64 | Molport | x | www.molport.com/buy-chemicals | ||||
65 | MSDS Hazcom Library | x | https://www.msdshazcom.com/ | ||||
66 | NCI database | x | x | x | 129.43.27.140/ncidb2/ | ||
67 | Nature Chemical Biology journal | x | www.nature.com/nchembio/index.html | ||||
68 | National Institute of Allergy and Infectious Diseases | x | x | chemdb2.niaid.nih.gov | |||
69 | NIST Chemistry Web Book | x | x | webbook.nist.gov/chemistry/ | |||
70 | Oakwood Chemical | x | www.oakwoodchemical.com/ | ||||
71 | PDB | x | x | www.pdb.org/pdb/home/home.do | |||
72 | PHARMAGATEWAY | x | www.pharmagateway.net | ||||
73 | PharmGKB database | x | www.pharmgkb.org/ | ||||
74 | PLoS ONE | x | www.plosone.org/home.action | ||||
75 | Proceedings of the National Academy of Sciences (PNAS) | x | www.pnas.org/ | ||||
76 | PubChem | x | x | x | x | pubchem.ncbi.nlm.nih.gov/search/search.cgi | |
77 | PubMed | x | x | x | www.ncbi.nlm.nih.gov/pubmed/ | ||
78 | PubMed Central (PMC) | x | x | x | www.ncbi.nlm.nih.gov/pmc/ | ||
79 | Quertle | x | www.quertle.info | ||||
80 | Selleck Chemicals | x | www.selleckchem.com/ | ||||
81 | SigmaAldrich | x | x | x | www.sigmaaldrich.com/united-states.html | ||
82 | SIRI MSDS Index | x | x | hazard.com/msds/ | |||
83 | Specs | x | www.specs.net/snpage.php?snpageid=home | ||||
84 | Toxin and Toxin Target Database (T3DB) | x | x | www.t3db.org | |||
85 | TimTec | x | www.timtec.net/ | ||||
86 | TOXLINE (Toxicology Literature Online) | x | x | toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE | |||
87 | Chemical Carcinogenesis Research Information System | x | x | www.nlm.nih.gov/pubs/factsheets/ccrisfs.html | |||
88 | Hazardous Substances Data Bank (HSDB) | x | x | toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB | |||
89 | UNC Library Express | x | ncsu.worldcat.org | ||||
90 | Vitas-M Laboratory | x | www.vitasmlab.com/ | ||||
91 | Wikipedia | x | x | x | www.wikipedia.org/ | ||
92 | ZINC | x | zinc.docking.org/ |
Table 2: Data Sources in iScienceSearch.
Have a look at the InChI key in Figures 6 and 7, and it is clear that the structures in PubChem and REAXYS are different. Checking the InChI key is a convenient method to quickly differentiate complex structures. Complex compounds often have different structures under the same name in databases. In iScienceSearch we have a possibility to compare the structures from different databases, pointing immediately to a problem, alerting the scientist to define his query carefully.
Literature search
There are many systems on the Internet, and a user will limit his search to these sources with which he is familiar. iScienceSearch makes it easy to search over many sources introducing the user to useful new sources. Each Internet portal to literature, be it ACS [30], KonSearch [31], Heidi [32], etc. has its strength and weakness. Let’s assume a user is fairly familiar with the different data sources. Let’s assume he is Turkish and would like to have a quick overview which of the references are in his mother language. Below is the picture from the query “aspirin toxicity review” in KonSearch filtered for Turkish documents. Such a filter is a one click in KonSearch, an option on the right side.
ScienceSearch is a meta search engine. A meta-search engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source [33]. iScienceSearch is an ASP.Net web application hosted under Internet Information Server (IIS). All searches in iScienceSearch are executed asynchronously. That allows executing a high number of searches independent from each other. It also allows interacting with the UI (user interface) while searches are still executing. This means the result grid gets populated with links as soon as one of the searches found a hit. As soon as there are new hits found the result grid gets updated with those results. Since the UI is not blocked while the search is executed, the user can open already result links, while searching goes on in the background. A progress bar shows the search progress in percentage of completeness. The search can be canceled at any time.
The chemical drawing tool used in iScienceSearch is JSDraw from Scilligence Corporation [34]. The editor is written in JavaScript. That means no Java Plugin need to be installed in the browser. The only requirement is that the browser has JavaScript enabled, which the default is setting in all major browsers.
The query extension (see Extended Search) is using the RESTstyle version of PUG (Power User Gateway), a web interface for accessing PubChem data (https://pubchem.ncbi.nlm.nih.gov/ pug_rest/PUG_REST.html) and the Chemical Identifier Resolver from NCI/CADD group (https://cactus.nci.nih.gov/chemical/ structure). For predicting chemical properties, (see Additional features) the CAP (Chemical Activity Predictor) web service provided by NCI/CADD group is used.
ScienceSearch provides one user interface to search many databases on the Internet. The advantage is that one gets a quick overview as to which source contains relevant information about a compound. iScienceSearch is unique as an Internet search engine, because it allows you to search by structure, and not only by text. The extended search makes it possible to widen the query.
With a structure search you find answers in databases, which for example can only be searched by CAS Registry Number or text. iScienceSearch provides a short list of links with the numbers of hits in each source. This makes it easy to pick the most relevant answers.