The Butte Lab

In the era of Precision Medicine, a data-driven approach for hypothesis generation is on the rise. It begins with the observation that each individual’s immune system varies with time. As a result, their diseases and response to medications differ. The Butte Lab’s interest lies in harnessing the publicly accessible clinical trials and research data to gain a systemic understanding of the immune system at the patient level. As an example, Dr. Atul Butte and his team have successfully shown that reanalysis of Rituximab therapy in ANCA-Associated Vasculitis identified granulocyte subsets as a novel early marker of successful treatment in stratified patients (Arthritis Res Ther., 2015). The lab is also leveraging EMR data to develop prediction models for recruiting potential clinical trial candidates.

Human genome sequencing is now dropping in price to a few thousand dollars. Starting in 2007, when few were predicting actual patients would soon be arriving with human genomes, the Butte Lab started an effort to re-read every past and current paper in human disease genetics, to build a master database of disease-associated SNPs. Now termed VARIMED, this database was instrumental in the medical examination of the first patient with a genome sequence (Lancet, 2010) as well as the first health family of four (PLoS Genetics, 2011). We have continued to use VARIMED scientifically to study the nature of disease and human migration itself, showing that certain phenotypes like type 1 diabetes may have been protective phenotypes in the past (PLoS One, 2010), and that type 2 diabetes is unique in showing a loss of risk alleles during human migration (published two papers in PLoS Genetics: 2012 and 2013). We also continue to analyze medical genomes, now addressing rare “n=1” children who present with previously unknown likely-genetic diagnoses, including one recently published in Genetics in Medicine (2014) and reported by CNN. VARIMED formed the basis of our recent study showing that abnormalities in genetic traits (e.g. blood levels of various molecules) could precede the development of genetic disorders, published in Science Translational Medicine (2014) and highlighted by Francis Collins in “NIH Director’s Blog.” More recently we started integrating the genetics data with clinical information to study disease co-morbidity (PLoS Computational Biology, 2016).

Incredible new discoveries happening in cancer research today, such as detecting cancer cells in the blood stream, or harnessing the immune system to fight cancers. These research methodologies generate enormous amounts of data that can and should be harnessed by researchers and engineers to yield new drugs and diagnostics. Cell lines and mice have been placeholders for studying human cancer for decades, resulting in thousands of mouse models for all cancer types. While results from those studies are chronicled in scientific papers and journals, it is difficult to know how relevant the data from these experimental systems are to the actual research and development of drugs and diagnostics in actual human cancers. The Butte lab is actively addressing these important questions by leveraging public genomic data to reveal novel insights related to human cancer. In previous years the Butte lab have shown how years of research data can be reused to study cancer – both ‘omics’ profiles (Cancer Research 2014; Nature Communications 2015; Scientific Reports 2-16) and ‘clinical’ profiles (Science Translational Medicine 2014; PSB 2015); matching cancer genomes to established cell lines and vice-versa (PSB 2011; BMC Med Genomics 2015); and connecting research models of cancer with actual cancer therapy in humans (Cancer Discovery 2013; Genome Biology 2016).

The analysis of huge genomics and functional genomics datasets that have been acquired across different experimental platforms and disease phenotypes has facilitated the discovery of common disease mechanisms. With newly found similarities discovered between diseases, the Butte Lab predicts whether drugs known to be efficacious for diseases could be applied to other diseases, known as drug repositioning. In 2011, we showed how the anti-ulcer drug cimetidine activates apoptosis in lung adenocarcinoma cells in a mouse explant model, and how the anti-epileptic drug topiramate can treat inflammatory bowel disease in a rat model, both starting with publicly-available microarray datasets (Science Translational Medicine, 2011). More recently, we showed how tricyclic antidepressants could be efficacious against neuroendocrine tumors (Cancer Discovery, 2013), such as small cell lung cancer in adults, and potentially pheochromocytoma and neurofibromatosis. We have shown how computational drug discovery can yield findings that reach clinical trials within two years. Leveraging new big datasets such as TCGA, LINCS, and CCLE, the Butte Lab recently identified similar new indications for drugs to counteract gastrointestinal stromal tumors, Ewing’s Sarcoma, and liver cancer.

One direction toward precise medicine is analyzing Electronic Medical Records (EMRs), also known as Electronic Health Records (EHRs). Today every medical related institute generates the enormous amount of data. Processing it towards meaningful data is challenging as it requires knowledge of different aspects including DB, NLP, ML, and medicine. Research in the lab include disease’s progress prediction, drug response, ‘next disease’ prediction, and more. We leveraged EMRs/EHRs and integrated it with large-scale genomic information from public resources to identify a new indication of an existing drug. With a developed computational pipeline, we identified a new indication of known asthma drug for ALS (Amyotrophic Lateral Sclerosis, i.e. Lou Gehrig’s disease) with novel target identification (Paik et al, 2015, Sci. Rep).

Research

We solve problems relevant to genomic medicine by developing new methodologies in translational bioinformatics.

Immunology

Genetics

Cancer

Drug Discovery

Immunology

Genetics

Cancer

Drug Discovery

Electronic Medical Records (EMR)