LSP Cheminformatics Toolkit
- LSP Cheminformatics Toolkit
Components
LSP Cheminformatics Server
The server runs in the background and performs the requested computations. It can be queried using a JSON API.
LSP Cheminformatics Server JSON API documentation
R client
The R client provides easy access to the functionality provided by the Cheminformatics server.
Installation
Installing Anaconda
We recommend installing this package in a separate Anaconda environment. If Ananconda is not available, first install Anaconda.
Creating the environment and installing dependencies
Once anaconda is installed, create a new environment for this package and install RDKit (see here for more help).
conda create -c rdkit -n lspcheminf_env python=3.7 rdkit click flask pandas gunicorn marshmallow apispec
conda activate lspcheminf_env
conda install -c conda-forge molvs
Installing the LSP Cheminformatics tools
conda activate lspcheminf_env
pip install --no-deps 'git+https://github.com/labsyspharm/lsp-cheminformatics.git#egg=lspcheminf&subdirectory=lsp_cheminf_server'
To upgrade to a newer version if the LSP Cheminformatics tools are already installed:
conda activate lspcheminf_env
pip install --no-deps --upgrade 'git+https://github.com/labsyspharm/lsp-cheminformatics.git#egg=lspcheminf&subdirectory=lsp_cheminf_server'
Running the server
The JSON API is exposed by running the server in the background. The following command runs the server on port 8000.
conda activate lspcheminf_env
gunicorn --workers=4 -b 127.0.0.1:8000 -t 600 lspcheminf
Once this command is issued the server will continue running until it is manually stopped using the Ctrl-C key combination.
While the server is running the documentation for the JSON API is available at http://127.0.0.1:8000/doc.
Querying the server
Any software capable of sending and receiving JSON can be used to query the server. For convenience, there is an R client that implements the JSON API in simple functions. In this example we use the R client to query for the chemical similarities between a number of compounds.
Installing the R client
devtools::install_github("labsyspharm/lsp-cheminformatics", subdir = "lsp_cheminf_rclient")
The client only functions while the lspcheminf server is running.
Query for similarities
The R client can be used to query the chemical similarity between compounds. By default the Morgan fingerprinting algorithm is used.
library(lspcheminf)
chemical_similarity(
c("resveratrol" = "InChI=1S/C14H12O3/c15-12-5-3-10(4-6-12)1-2-11-7-13(16)9-14(17)8-11/h1-9,15-17H/b2-1+"),
c(
"tofacitnib" = "InChI=1S/C16H20N6O/c1-11-5-8-22(14(23)3-6-17)9-13(11)21(2)16-12-4-7-18-15(12)19-10-20-16/h4,7,10-11,13H,3,5,8-9H2,1-2H3,(H,18,19,20)/t11-,13+/m1/s1",
"aspirin" = "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
)
)
#> # A tibble: 2 x 3
#> query score target
#> <chr> <dbl> <chr>
#> 1 resveratrol 0.0571 tofacitnib
#> 2 resveratrol 0.132 aspirin
If we want to compute the similarity based on topological fingerprints instead and pass some custom arguments to the RDKit function we can do this:
chemical_similarity(
c("resveratrol" = "InChI=1S/C14H12O3/c15-12-5-3-10(4-6-12)1-2-11-7-13(16)9-14(17)8-11/h1-9,15-17H/b2-1+"),
c(
"tofacitnib" = "InChI=1S/C16H20N6O/c1-11-5-8-22(14(23)3-6-17)9-13(11)21(2)16-12-4-7-18-15(12)19-10-20-16/h4,7,10-11,13H,3,5,8-9H2,1-2H3,(H,18,19,20)/t11-,13+/m1/s1",
"aspirin" = "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
),
fingerprint_type = "topological",
fingerprint_args = list("minPath" = 2, "useHs" = FALSE)
)
#> # A tibble: 2 x 3
#> query score target
#> <chr> <dbl> <chr>
#> 1 resveratrol 0.0967 tofacitnib
#> 2 resveratrol 0.198 aspirin
Query for substructure matches
match_substructure(
c("secondary_amine" = "[H]N(C)C"),
c(
"tofacitnib" = "InChI=1S/C16H20N6O/c1-11-5-8-22(14(23)3-6-17)9-13(11)21(2)16-12-4-7-18-15(12)19-10-20-16/h4,7,10-11,13H,3,5,8-9H2,1-2H3,(H,18,19,20)/t11-,13+/m1/s1",
"aspirin" = "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
),
query_identifier = "smiles"
)
#> # A tibble: 1 x 3
#> match query target
#> <list> <chr> <chr>
#> 1 <list [6]> secondary_amine tofacitnib
Six matches for secondary amine groups where found in tofacitnib. The atom indices for each match are stored in the match
list column.
Additional functionality
At the moment there is no R client code yet for the additional functionalities of the JSON API, like the molecule drawing, ID conversion etc.
Funding
This work was supported by NIH grants U54-HL127365, U24-DK116204 and U54-HL127624.