Data Incubator Project Proposal

Predicting Unknown Adverse Side Effects for Drugs and Beyond

Adverse side effects of drugs are undoubtedly a burden on patients and a primary concern of drug safety. Recently, “SIDER 4.1” (Side Effect Resource) database was released including information on marketed medicines and recorded adverse drug effects, which shows wide application potential in multiple domains of drug discovery. In particular, this project proposal aims at predicting unknown adverse side effects for both known and new drugs by analysis of complex drug-side effect network and machine learning strategies.  In addition, the prediction of drug-drug interaction and drug-target interaction will be also included in this proposal with the combination of other database, which is of great benefits to rational drug administration and drug repurposing.

Data Sources:

Source 1: Drug-Side Effect (SE) data

SIDER 4.1: 1) drug-side effect data (

2) drug-indication data (

Source 2: Drug-Drug Interaction & Drug-Target Interaction data


Drug Interaction API:

In preliminary data analysis, only the SIDER 4.1-“meddra_all_se.tsv.gz” was used.  However, even for this small dataset, careful data cleaning needs to be processed to remove redundant drug-SE pair entries. In the long term, we will integrate the data from drug-indication pair, drug-target pair, drug-drug interaction to provide a more complete picture of the mechanism of the action and side effects of drugs.

Aim 1: Analysis of Promiscuity of Drug-Side Effect (SE) Network

First, to obtain the insights and better visualization of Drug-SE association, I built Drug-SE network graph using “meddra_all_se.tsv.gz” dataset with “spring” layout algorithm (Plot 1). We can observe huge drug promiscuity in terms of their side effects: the dots that are more close to the center, the more degree of connectivity they have. Statistical analysis tells us that in this dataset there are total non-redundant number of 1430 drugs and 6060 side effects that share 162,363 associations. The average degree of connectivity for all drugs and SEs is 43.4, among which the maximum association for a drug (to SE) is 839 and the maximum association for a SE (to drug) is 1207.


Next, I plotted the histogram for the distribution of the number of SEs for drugs as an indicator of evaluating of drug safety (Plot 2a). We noted that this histogram is highly “skewed”, which means most of drugs have relatively less side effects. Top 10 most dangerous drugs are listed for safety alert. Future study will investigates the chemical structure alerts for those highly dangerous drugs in order to avoid potential safety issue in newly designed drug molecules.

Also, I plotted the histogram for the distribution of the number of drugs for commonly shared SEs (Plot 2b). Only SEs that are shared by more than 100 drugs are included. Top 10 common SEs are listed, which suggests these SEs might come from normal body physical or mental reaction to outside staff just like towards placebo. Moreover, it is worth noting that the majority of drugs shared multiple SEs to different extent, which has at least two-fold significance: 1) Avoiding drug combination that have the same severe SEs such as cardiac issues; 2) Predicting drug similarity using SE profile similarity, from which we can predict new drug targets, indications for old drugs with the assistance of other database like Drugbank.


Aim 2: Prediction of Drug Targets, Indications from Drug Similarity calculation

After analysis of drug-SE association, I plan to use that for prediction of drug targets and indications for known drugs using SE profile similarity. I will first transfer the Drug-SE association table to a sparse matrix of drug’s SE profiles with binary values in each cell. Next, I will calculate the similarity between drugs by comparing their SE profiles. The higher drug similarity index is, the more likely these drugs have the same targets and indications. Finally, with calculated drug similarity and drug-target interaction database (“DrugBank”) and drug-indication pair database (“SIDER 4.1”), we can predict new drug targets and indications, which will show great impact on drug repurposing.


Aim 3: Prediction of Side Effects for Drugs using Machine Learning Methods

Finally, I will predict the unknown side effects for known and unknown drugs using different machine learning methods. Since clinical data is not capable to unveil all side effects for marketed drugs, especially for recently-approved drugs, I will use recommender system algorithm to predict potential side effects for known drugs. The sparse matrix of drug’s SE profile generated from Aim 2 will be used for matrix factorization to generate predicted SE for previous “0” cells.

On the other side, I will used the known drug-SE sparse matrix to train a multi-task machine learning model using molecular graph of drugs as input. Cross-validation will be implemented to parameterize the learnt model, which will finally be used to predict the SE profile for a novel drug molecule. As such, it will guide how to use the novel drug in a more safe manner prior to the availability of large population clinical data.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close