Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Accelerating Scientific Research Through Machine Learning and Graph

Miroculus is a molecular diagnostics company that leverages the potential of microRNAs as biomarkers and has created the most easy-to-use and automated platform for their detection. MicroRNAs are small non-coding RNA molecules, whose primary role is to regulate the expression of our genes. Their discovery in circulation of body fluids such as blood plasma/serum, urine and saliva has been followed up by a multitude of studies, providing evidence that detection of specific microRNA molecules can give clues about a person’s health status and may therefore be used as biomarkers for various conditions.

Loom is an up-to-date snapshot of the scientific literature landscape focused on microRNAs that we built to expedite our own research. As of today, there is no compelling way to access much of the microRNA research. By using Loom's easy-to-use, interactive UI, the researcher is able to quickly locate the relevant sentences across many publications relating specific microRNAs with her disease or gene of interest. With this tool, our objective is to provide a visually compelling and complete overview of how microRNAs relate to specific diseases and genes.

At the backend, Loom is comprised of 4 microservices. The first one is a listener that fetches new publications daily that are available in the NCBI databases: PubMed for abstracts and PMC for full-text, open-access publications. Then, a natural language processor scans the publication, breaking them down into their constituent sentences and detecting mentions of microRNAs, genes and diseases.

Within each sentence, a machine learning scorer evaluates the strength and type of relationship on a scale from 0 to 1 and outputs the results in a graph database. The resulting graph database is then queried in real-time by the UI to retrieve the sentences and relationships the user is interested in.

  • Login to see the comments

Accelerating Scientific Research Through Machine Learning and Graph

  1. 1. Accelerating scientific research through Machine Learning & Graph Jorge Soto CTO, Miroculus Antonio Molins VP Data Science, Miroculus SAN FRANCISCO 13-14 OCTOBER 2016
  2. 2. microRNAs
  3. 3. DNA
  4. 4. mRNA DNA
  5. 5. mRNAPROTEIN DNA
  6. 6. mRNAPROTEIN DNA microRNA
  7. 7. 1993 lin-4 in c. elegans 2000 let-7 in h. sapiens microRNA
  8. 8. microRNAs are tissue specific 1993 lin-4 in c. elegans 2000 let-7 in h. sapiens
  9. 9. microRNA expression across different cancer types gastrointestinal tract samples epithelial origin samples Jun Lu et al. MicroRNA expression profiles classify human cancers. Nature 435, 834-838(9 June 2005) 1993 lin-4 in c. elegans 2000 let-7 in h. sapiens 2002 1st link to cancer
  10. 10. 1993 lin-4 in c. elegans 2000 let-7 in h. sapiens 2008 plasma 2002 1st link to cancer microRNAs found cell-free in biofluids
  11. 11. Highly Stable Organ/Tissue Specific Detectable in blood microRNA as an ideal biomarker
  12. 12. microRNAs reflect your physiology Red blood cells Liver Muscle Heart
  13. 13. Red blood cells Liver Muscle Heart microRNAs reflect your physiology
  14. 14. Red blood cells Liver Muscle Heart microRNAs reflect your physiology
  15. 15. Red blood cells Liver Muscle Heart microRNAs reflect your physiology
  16. 16. 3 simple steps Sample collection 20 mins
  17. 17. Assay sensitivity
  18. 18. 3 simple steps Sample collection Insert sample in a cartridge device 20 mins 60 mins
  19. 19. Digital microfluidics technology
  20. 20. 3 simple steps Sample collection Automated workflow and data analysis real time Insert sample in a cartridge device 20 mins 60 mins
  21. 21. Adaptated from Nair et al, Am J Epid, 2014 Tissue VS circulating microRNA related publications 2000 let-7 in h. sapiens 2008 plasma
  22. 22. Breast
 Cancer miR-34a DADS
  23. 23. Retrieve Read LearnSearch Choose
  24. 24. Search Choose Retrieve Read Learn Retrieve Read Learn Retrieve Read Learn Retrieve Read Learn Retrieve Read Learn Retrieve Read Learn
  25. 25. Retrieved 1,000,000+
 articles 192,496,883 lines 199,639,090 sentences 111,382,775 concept mentions What has the elephant learnt so far?
  26. 26. “As shown in Fig. 3, DADS inhibited breast cancer growth by up- regulating MiR-34A expression.” What has the elephant learnt so far?
  27. 27. “As shown in Fig. 3, DADS inhibited breast cancer growth by up- regulating MiR-34A expression.” What has the elephant learnt so far? DADS
  28. 28. Breast
 Cancer DADS “As shown in Fig. 3, DADS inhibited breast cancer growth by up- regulating MiR-34A expression.” What has the elephant learnt so far?
  29. 29. Breast
 Cancer miR-
 34A DADS “As shown in Fig. 3, DADS inhibited breast cancer growth by up- regulating MiR-34A expression.” What has the elephant learnt so far?
  30. 30. Distant supervision for relationship classification Blog post in MSFT dev site
  31. 31. Distant supervision for relationship classification Blog post in MSFT dev site
  32. 32. Distant supervision for relationship classification Blog post in MSFT dev site
  33. 33. Distant supervision for relationship classification Blog post in MSFT dev site
  34. 34. Distant supervision for relationship classification Blog post in MSFT dev site
  35. 35. 52 [cypher]
  36. 36. [cypher] [cypher]
  37. 37. www.loom.bio
  38. 38. - connect to NCBI databases (pubmed and pmc) and fetch new publications - identify when microRNAs are mentioned in relationship to genes or diseases - split the results into sentences NLP I can...Listener I can... Loom architecture Scorer I can... - score between 0 to 1 the accuracy of the relations between the entities using machine learning Graph I can... - store the relationships and their score in a graph database - be queried about each node and their relationships 55
  39. 39. Weiland et al, RNA biology, 2012 When discovery > validation
  40. 40. “Most clinical research therefore fails to be useful not because of its findings but because of its design” - JPA Ioannidis, PLOS Medicine, 2016
  41. 41. Unmet clinical need for stomach cancer patients
  42. 42. In collaboration with: Inclusion criteria Individuals suspected of stomach cancer eligible for endoscopies. Collection All samples collected from 2010 to 2013. Machine-learned model Samples split 50/50 in two groups doubly balanced per country, gender, diagnosis, subtype and stage. Cohort distribution 650 samples including the entire cascade of the disease. Multi-center Samples collected in Chile, Lithuania and Latvia. Clinical study design
  43. 43. Proprietary 7-microRNA diagnostic signature
  44. 44. Proprietary 7-microRNA diagnostic signature Decision boundary set to maximize accuracy for the observed prevalence
  45. 45. Robust regardless of stage Good performance across ethnicities Decision boundary set to maximize accuracy for the observed prevalence Proprietary 7-microRNA diagnostic signature
  46. 46. - + Without Miroculus With Miroculus NPV = 99.8% Miroculus test compared to gold standard
  47. 47. Ideal biomarker Cost effective, simple and accurate detection Future of diagnostics Enabling technology Advanced data analysis
  48. 48. jorge@miroculus.com
 antonio@miroculus.com http://loom.bio

×