SlideShare a Scribd company logo
1 of 95
Download to read offline
An Introduction to Web Apollo

Manual Annotation Workshop at Kansas State University
Monica Munoz-Torres, PhD | @monimunozto

Berkeley Bioinformatics Open-Source Projects (BBOP)

Genomics Division, Lawrence Berkeley National Laboratory

IX Arthropod Genomics Symposium. Manhattan, KS. 17 June, 2015
2COURSE MATERIAL
Recommended	
  Browsers:	
  Google	
  Chrome,	
  Firefox.	
  
	
  
Exercises	
  file	
  available	
  at	
  Basecamp	
  
	
  
Workshop	
  slides	
  and	
  answers	
  to	
  exercises	
  will	
  be	
  
available	
  on	
  Basecamp	
  next	
  week.	
  
TODAY
OUTLINE

Web	
  Apollo	
  CollaboraBve	
  CuraBon	
  and	
  	
  
InteracBve	
  Analysis	
  of	
  Genomes	
  
3OUTLINE
•  GENOME	
  CURATION	
  
steps	
  involved	
  
•  COMMUNITY	
  BASED	
  CURATION	
  
our	
  experience	
  
	
  
•  APOLLO	
  
empowering	
  collaboraBve	
  curaBon	
  
	
  
•  APOLLO	
  on	
  THE	
  WEB	
  
becoming	
  acquainted	
  
•  PRACTICE	
  
demonstraBon	
  and	
  exercises	
  
4
DURING THIS WORKSHOP

you will

v Understand	
  the	
  process	
  of	
  genome	
  curaBon	
  in	
  the	
  context	
  of	
  
annotaBon:	
  	
  
assembled	
  genome	
  à	
  automated	
  annotaBon	
  à	
  manual	
  annotaBon	
  
v Become	
  familiar	
  with	
  the	
  environment	
  and	
  funcBonality	
  of	
  the	
  Web	
  
Apollo	
  genome	
  annotaBon	
  ediBng	
  tool.	
  
v Learn	
  to	
  idenBfy	
  homologs	
  of	
  known	
  genes	
  of	
  interest	
  in	
  a	
  newly	
  
sequenced	
  genome	
  of	
  interest.	
  
v Learn	
  how	
  to	
  corroborate	
  and	
  modify	
  automaBcally	
  generated	
  gene	
  
models	
  using	
  available	
  biological	
  evidence	
  (in	
  Apollo).	
  
Introduction
5
I INVITE YOU TO:
v  Observe	
  details	
  in	
  figures	
  
v  Listen	
  to	
  explanaBons	
  
v  Ask	
  quesBons	
  at	
  any	
  Bme	
  
v  Use	
  TwiNer	
  &	
  share	
  your	
  thoughts:	
  I	
  am	
  @monimunozto	
  	
  
A	
  few	
  tags	
  &	
  users:	
  #WebApollo	
  #annotaBon	
  #biocuraBon	
  #GMOD	
  
#genome	
  @JBrowseGossip	
  
v  Take	
  brakes:	
  	
  
LBL’s	
  ergo	
  team	
  suggests	
  I	
  should	
  not	
  work	
  at	
  the	
  computer	
  for	
  >45	
  
minutes	
  without	
  a	
  break;	
  neither	
  should	
  you!	
  We	
  will	
  be	
  here	
  for	
  ~2.5	
  
hours:	
  please	
  get	
  up	
  and	
  stretch	
  your	
  neck,	
  arms,	
  and	
  legs	
  as	
  o^en	
  as	
  
you	
  need.	
  
Introduction
I kindly ask that you refrain from:
v  Reading	
  all	
  the	
  text	
  I	
  wrote.	
  	
  
Think	
  of	
  the	
  text	
  on	
  these	
  slides	
  as	
  your	
  “class	
  notes”.	
  You	
  
will	
  use	
  them	
  during	
  exercises.	
  
v  Checking	
  email.	
  
You	
  have	
  my	
  undivided	
  aNenBon,	
  I’d	
  like	
  to	
  receive	
  yours	
  in	
  
exchange.	
  	
  
Warning:	
  If	
  you	
  get	
  *caught*,	
  you	
  will	
  read	
  it	
  out	
  loudly	
  for	
  
everyone	
  to	
  hear,	
  we	
  may	
  contribute	
  to	
  the	
  response.	
  
Introduction
Let Us Get Started
REMEMBER, REMEMBER… 

from intro webinar last week
Web	
  Apollo	
  IntroducDon	
  
Biological	
  concepts	
  to	
  beNer	
  
understand	
  manual	
  annotaBon	
  
8OUTLINE
•  CENTRAL	
  DOGMA	
  
in	
  molecular	
  biology	
  
	
  
•  WHAT	
  IS	
  A	
  GENE?	
  
let’s	
  think	
  computaBonally	
  
•  TRANSCRIPTION	
  
mRNA	
  in	
  detail	
  
	
  
•  TRANSLATION	
  
and	
  many	
  definiBons	
  
•  GENOME	
  CURATION	
  
steps	
  involved	
  
•  WHAT	
  TO	
  LOOK	
  FOR	
  
training	
  the	
  annotators	
  
CURATING GENOMES

steps involved
1  GeneraDon	
  of	
  Gene	
  Models	
  
calling	
  ORFs,	
  one	
  or	
  more	
  
rounds	
  of	
  gene	
  predicBon,	
  etc.	
  
	
  
2  AnnotaDon	
  of	
  gene	
  models	
  
Describing	
  funcBon,	
  
expression	
  paNerns,	
  metabolic	
  
network	
  memberships.	
  
3  	
  	
  Manual	
  annotaDon	
  
CURATING GENOMES 9
10Manual Curation
GENE PREDICTION
v  The	
  idenBficaBon	
  of	
  structural	
  features	
  of	
  the	
  genome.	
  
•  Primarily	
  protein-­‐coding	
  genes.	
  	
  
•  Also	
  transfer	
  RNAs	
  (tRNA),	
  ribosomal	
  RNAs	
  (rRNA),	
  
regulatory	
  moBfs,	
  long	
  and	
  small	
  non-­‐coding	
  RNAs	
  (ncRNA),	
  
repeBBve	
  elements	
  (masked),	
  etc.	
  
11Manual Curation
GENE PREDICTION
v  Methods	
  for	
  discovery:	
  
	
  
1)	
  Ab	
  ini&o:	
  based	
  on	
  DNA	
  
composiBon,	
  deals	
  strictly	
  with	
  
genomic	
  sequences	
  and	
  makes	
  
use	
  of	
  staBsBcal	
  approaches	
  to	
  
search	
  for	
  coding	
  regions	
  and	
  
typical	
  gene	
  signals.	
  	
  
	
  
•  E.g.	
  Augustus,	
  GENSCAN,	
  	
  
geneid,	
  fgenesh,	
  etc.	
  
12
Nucleic Acids 2003 vol. 31 no. 13 3738-3741
Manual Curation
GENE PREDICTION
v  Methods	
  for	
  discovery:	
  
2)	
  Homology-­‐based:	
  evidence-­‐based;	
  finds	
  genes	
  using	
  either	
  similarity	
  
searches	
  in	
  the	
  main	
  databases	
  or	
  experimental	
  data	
  including	
  RNAseq,	
  
expressed	
  sequence	
  tags	
  (ESTs),	
  full-­‐length	
  complementary	
  DNAs	
  (cDNAs),	
  
etc.	
  	
  
•  E.g:	
  SGP2,	
  fgenesh++	
  
13
In	
  some	
  cases	
  algorithms	
  and	
  metrics	
  used	
  to	
  generate	
  
consensus	
  sets	
  may	
  actually	
  reduce	
  the	
  accuracy	
  of	
  the	
  gene’s	
  
representaBon;	
  in	
  such	
  cases	
  it	
  is	
  usually	
  beNer	
  to	
  use	
  an	
  	
  
ab	
  ini&o	
  model	
  to	
  create	
  a	
  new	
  annotaBon.	
  
GENE ANNOTATION
IntegraBon	
  of	
  data	
  from	
  predicBon	
  tools	
  to	
  generate	
  a	
  reliable	
  set	
  of	
  
structural	
  annotaDons:	
  involves	
  ab	
  ini&o	
  predicBons,	
  assessment	
  of	
  
biological	
  evidence	
  to	
  drive	
  the	
  gene	
  predicBon	
  process,	
  and	
  the	
  
synthesis	
  of	
  these	
  results	
  to	
  produce	
  a	
  set	
  of	
  consensus	
  gene	
  models.	
  
	
  
v  Models	
  may	
  be	
  organized	
  using:	
  
v  automaBc	
  integraBon	
  of	
  predicted	
  sets;	
  e.g:	
  GLEAN	
  
v  packaged	
  tools	
  from	
  pipeline;	
  e.g:	
  MAKER	
  
Manual Curation
NOT PERFECT 

automated annotation remains an imperfect art
Unlike	
  the	
  more	
  highly	
  polished	
  genomes	
  of	
  earlier	
  projects,	
  today’s	
  genomes	
  
have:	
  
1.  lower	
  coverage.	
  
2.  more	
  frequent	
  assembly	
  errors	
  and	
  annotaBon	
  of	
  genes	
  across	
  
mulBple	
  scaffolds.	
  
CURATING GENOMES 14
Image: www.BroadInstitute.org
MANUAL ANNOTATION

working concept
	
  
v  Precise	
  elucidaBon	
  of	
  biological	
  features	
  encoded	
  
in	
  the	
  genome	
  requires	
  careful	
  examinaBon	
  and	
  
review.	
  	
  
Schiex	
  et	
  al.	
  Nucleic	
  Acids	
  2003	
  (31)	
  13:	
  3738-­‐3741	
  
Automated Predictions
Experimental Evidence
Manual Curation 15
cDNAs,	
  HMM	
  domain	
  searches,	
  RNAseq,	
  
genes	
  from	
  other	
  species.	
  
MANUAL ANNOTATION

is necessary
v  Evaluate	
  all	
  available	
  evidence	
  and	
  corroborate	
  
or	
  modify	
  genome	
  element	
  predicBons.	
  	
  
v  Determine	
  funcBonal	
  roles	
  through	
  comparaBve	
  
analysis	
  using	
  literature,	
  databases,	
  and	
  
experimental	
  data.	
  
v  Resolve	
  discrepancies	
  and	
  validate	
  automated	
  
gene	
  model	
  hypotheses.	
  
v  Desktop	
  version	
  of	
  Apollo	
  	
  	
  	
  	
  	
  was	
  designed	
  to	
  fit	
  
the	
  manual	
  annotaBon	
  needs	
  of	
  genome	
  projects	
  
such	
  as	
  fruit	
  fly,	
  mouse,	
  zebrafish,	
  human,	
  etc.	
  
Manual Curation 16
Automated Predictions
Curated Gene Models
Official Gene Set
“Incorrect	
  and	
  incomplete	
  genome	
  annota&ons	
  
will	
  poison	
  every	
  experiment	
  that	
  uses	
  them”.	
  
-­‐	
  M.	
  Yandell	
  
BUT, MANUAL CURATION

did not always scale well
A	
  small	
  group	
  of	
  highly	
  
trained	
  experts;	
  e.g.	
  GO	
  
1	
   Museum	
  Model	
  
A	
  few	
  very	
  good	
  biologists	
  
and	
  a	
  few	
  very	
  good	
  
bioinformaBcians	
  camp	
  
together,	
  during	
  intense	
  but	
  
short	
  periods	
  of	
  Bme.	
  
Old-­‐Dme	
  Jamborees	
  2	
  
Researchers	
  work	
  by	
  
themselves,	
  then	
  may	
  or	
  
may	
  not	
  publicize	
  results;	
  …	
  
may	
  be	
  a	
  dead-­‐end	
  with	
  
very	
  few	
  people	
  ever	
  aware	
  
of	
  these	
  results.	
  
CoQage	
  Model	
  3	
  
Elsik	
  et	
  al.	
  2006.	
  Genome	
  Res.	
  16(11):1329-­‐33.	
  
Manual Curation 17
Too	
  many	
  sequences	
  and	
  not	
  enough	
  hands	
  to	
  approach	
  curaBon.	
  
POWER TO THE CURATORS

augment existing tools
Fill	
   in	
   the	
   gap	
   for	
   all	
   the	
   things	
   that	
  
won’t	
   be	
   easy	
   to	
   cover	
   with	
   these	
  
approaches;	
  this	
  will	
  allow	
  researchers	
  
to	
  beNer	
  contribute	
  their	
  efforts.	
  
Give	
  more	
  people	
  the	
  power	
  to	
  curate!	
  
Big	
  data	
  are	
  not	
  a	
  subsBtute	
  for,	
  but	
  
a	
   supplement	
   to	
   tradiBonal	
   data	
  
collecBon	
  and	
  analysis.	
  
The	
  Parable	
  of	
  Google	
  Flu.	
  Lazer	
  et	
  al.	
  2014.	
  Science	
  343	
  (6176):	
  1203-­‐1205.	
  
v Enable	
  more	
  curators	
  to	
  work	
  
v Enable	
  beNer	
  scienBfic	
  publishing	
  
v Credit	
  curators	
  for	
  their	
  work	
  	
  
Manual Curation 18
IMPROVING TOOLS FOR MANUAL ANNOTATION

our plan
“More	
  and	
  more	
  sequences”:	
  more	
  genomes,	
  within	
  populaBons	
  and	
  
across	
  species,	
  are	
  now	
  being	
  sequenced.	
  	
  
	
  
This	
  begs	
  the	
  need	
  for	
  a	
  universally	
  accessible	
  genome	
  curaBon	
  tool:	
  
Manual Curation 19
To	
  produce	
  accurate	
  sets	
  of	
  
genomic	
  features.	
  
To	
  address	
  the	
  need	
  to	
  correct	
  for	
  
more	
  frequent	
  assembly	
  and	
  
automated	
  predicBon	
  errors	
  due	
  
to	
  new	
  sequencing	
  technologies.	
  
GENOME ANNOTATION

an inherently collaborative task
Researchers	
  o^en	
  turn	
  to	
  colleagues	
  for	
  second	
  opinions	
  and	
  insight	
  from	
  those	
  
with	
  experBse	
  in	
  parBcular	
  areas	
  (e.g.,	
  domains,	
  families).	
  To	
  facilitate	
  and	
  
encourage	
  this,	
  we	
  conBnue	
  to	
  improve	
  Apollo.	
  
APOLLO 20
Apollo	
  is	
  a	
  web-­‐based,	
  collaboraBve	
  genomic	
  annotaBon	
  
ediBng	
  plavorm.	
  
We	
  need	
  annota&on	
  edi&ng	
  tools	
  to	
  modify	
  and	
  refine	
  the	
  
precise	
  loca&on	
  and	
  structure	
  of	
  the	
  genome	
  elements	
  that	
  
predic&ve	
  algorithms	
  cannot	
  yet	
  resolve	
  automa&cally.	
  
hNp://GenomeArchitect.org	
  	
  
APOLLO

genome annotation editing tool
21
v  Web	
  based,	
  integrated	
  with	
  JBrowse.	
  
v  Supports	
  real	
  Bme	
  collaboraBon!	
  
v  AutomaBc	
  generaBon	
  of	
  ready-­‐made	
  computable	
  data.	
  	
  
v  Supports	
  annotaBon	
  of	
  genes,	
  	
  pseudogenes,	
  tRNAs,	
  snRNAs,	
  
snoRNAs,	
  ncRNAs,	
  miRNAs,	
  TEs,	
  and	
  repeats.	
  
v  IntuiBve	
  annotaBon,	
  gestures,	
  and	
  pull-­‐down	
  menus	
  to	
  create	
  and	
  
edit	
  transcripts	
  and	
  exons	
  structures,	
  insert	
  comments	
  (CV,	
  freeform	
  
text),	
  GO	
  terms,	
  etc.	
  
APOLLO
NEW APOLLO ARCHITECTURE

simpler, more flexible
APOLLO 22
Web-­‐based	
  client	
  +	
  annotaBon-­‐ediBng	
  engine	
  +	
  server-­‐side	
  data	
  service	
  
REST / JSON
Websockets
Annotation Engine (Server)
Shiro
LDAP
OAuth
JBrowse Data
Organism 2
Annotations
Security
Preferences
Organisms
Tracks
BAM
BED
VCF
GFF3
BigWig
Annotators
Google Web Toolkit (GWT) /
Bootstrap
JBrowse DOJO / jQuery JBrowse Data
Organism 1
Load genomic
evidence for
selected organism
Single Data Store
PostgreSQL, MySQL,
MongoDB, ElasticSearch
Apollo v2.0
We	
  conBnuously	
  train	
  and	
  support	
  hundreds	
  of	
  geographically	
  dispersed	
  
scienBsts	
   from	
   many	
   research	
   communiBes	
   to	
   conduct	
   manual	
  
annotaBons,	
  recovering	
  coding	
  sequences	
  in	
  agreement	
  with	
  all	
  available	
  
biological	
  evidence	
  using	
  Web	
  Apollo.	
  	
  
	
  
v  Gate	
  keeping	
  and	
  monitoring.	
  
v  Tutorials,	
  training	
  workshops,	
  and	
  “geneborees”.	
  
v  Personalized	
  user	
  support.	
  
23
DISPERSED COMMUNITIES
collaborative manual annotation efforts
APOLLO
24
CURATION

how it works
IdenBfies	
  elements	
  that	
  best	
  
represent	
  the	
  underlying	
  biology	
  
(including	
  missing	
  genes)	
  and	
  
eliminates	
  elements	
  that	
  reflect	
  
systemic	
  errors	
  of	
  automated	
  
analyses.	
  
Assigns	
  funcBon	
  through	
  
comparaBve	
  analysis	
  of	
  similar	
  
genome	
  elements	
  from	
  closely	
  
related	
  species	
  using	
  literature,	
  
databases,	
  and	
  researchers’	
  lab	
  
data.	
  
1	
  
2	
  
Examples	
  
Comparing	
  7	
  ant	
  genomes	
  contributed	
  to	
  
beNer	
  understanding	
  evoluBon	
  and	
  
organizaBon	
  of	
  insect	
  socieBes	
  at	
  the	
  
molecular	
  level;	
  e.g.	
  division	
  of	
  labor,	
  
mutualism,	
  chemical	
  communicaBon,	
  etc.	
  
Libbrecht	
  et	
  al.	
  2012.	
  Genome	
  Biology	
  2013,	
  14:212	
  
Queen	
  Bee	
  
Worker	
  Bee	
  
Castes	
  
Larva	
  
Dnmt	
  
RNAi	
  Royal	
  jelly	
  
Kucharski	
  et	
  al.	
  2008.	
  Science	
  (319)	
  5871:	
  1827-­‐1830	
  	
  	
  
Insect	
  
Methylome	
  
Anchoring	
  molecular	
  
markers	
  to	
  reference	
  
genome	
  pointed	
  to	
  
chromosomal	
  
rearrangements	
  &	
  detecBng	
  
signals	
  of	
  adapBve	
  radiaBon	
  
in	
  Heliconius	
  buNerflies.	
  	
  
Joron	
  et	
  al.	
  2011.	
  Nature,	
  477:203-­‐206	
  APOLLO
CURRENT COLLABORATIONS

training and contributions
Partnerships	
  
WEB APOLLO 25
UNIVERSITY
of MISSOURI
National
Agricultural
Library
Nature	
  Reviews	
  Gene&cs	
  2009	
  (10),	
  346-­‐347	
  
Norwegian	
  Spruce	
  hNp://congenie.org/	
  
Phlebotomus	
  papatasi	
  
Tallapoosa	
  darter	
  hNp://darter2.westga.edu/	
  
Wasmania	
  auropunctata	
  
Homo	
  sapiens	
  hg19	
  
Pinus	
  taeda	
  
hIp://dendrome.ucdavis.edu/treegenes/browsers/	
  
LESSONS LEARNED

What	
  we	
  have	
  learned:	
  	
  
•  CollaboraBve	
  work	
  disBlls	
  invaluable	
  knowledge	
  
•  We	
  must	
  enforce	
  strict	
  rules	
  and	
  formats	
  
•  We	
  must	
  evolve	
  with	
  the	
  data	
  
•  A	
  liNle	
  training	
  goes	
  a	
  long	
  way	
  
•  NGS	
  poses	
  addiBonal	
  challenges	
  
PREVIOUSLY WE LEARNED 26
THE COLLABORATIVE CURATION PROCESS AT I5K
1)  In	
  some	
  cases	
  a	
  computaBonally	
  predicted	
  consensus	
  gene	
  set	
  
is	
  generated	
  using	
  mulBple	
  lines	
  of	
  evidence.	
  In	
  other	
  cases,	
  
more	
  than	
  one	
  gene	
  set	
  are	
  made	
  available	
  for	
  analysis:	
  e.g.	
  
Primary	
  Gene	
  Sets:	
  HAZT_v0.5.3-­‐Models,	
  Augustus	
  gene	
  set.	
  
2)  i5K	
   Projects	
   will	
   integrate	
   consensus	
   computaBonal	
  
predicBons	
   with	
   manual	
   annotaBons	
   to	
   produce	
   an	
   updated	
  
Official	
  Gene	
  Set	
  (OGS):	
  
»  If	
  it’s	
  not	
  on	
  either	
  track,	
  it	
  won’t	
  make	
  the	
  OGS!	
  
»  If	
  it’s	
  there	
  and	
  it	
  shouldn’t,	
  it	
  will	
  sBll	
  make	
  the	
  OGS!	
  
27Collaborative Curation at i5K
CONSENSUS SET: REFERENCE AND START POINT
•  Isoforms:	
  drag	
  original	
  and	
  alternaBvely	
  spliced	
  form	
  to	
  ‘User-­‐
created	
  Annota&ons’	
  area.	
  
•  If	
  an	
  annotaBon	
  needs	
  to	
  be	
  removed	
  from	
  the	
  consensus	
  set,	
  
drag	
  it	
  to	
  the	
  ‘User-­‐created	
  Annota&ons’	
  area	
  and	
  label	
  as	
  
‘Delete’	
  on	
  InformaBon	
  Editor.	
  
•  Overlapping	
  interests?	
  Collaborate	
  to	
  reach	
  agreement.	
  
•  Follow	
  guidelines	
  for	
  i5K	
  Pilot	
  Species	
  Projects	
  as	
  shown	
  at	
  
hNp://goo.gl/LRu1VY	
  
	
  
28Collaborative Curation at i5K
Apollo	
  
Sort
30Becoming Acquainted with Web Apollo.
30
WEB APOLLO

the sequence selection window

NavigaBon	
  tools:	
  
pan	
  and	
  zoom	
   Search	
  box:	
  go	
  to	
  
a	
  scaffold	
  or	
  a	
  
gene	
  model.	
  	
  
Grey	
  bar	
  of	
  coordinates	
  
indicates	
  locaBon.	
  You	
  can	
  also	
  
select	
  here	
  in	
  order	
  to	
  zoom	
  to	
  
a	
  sub-­‐region.	
  
‘View’:	
  change	
  
color	
  by	
  CDS,	
  
toggle	
  strands,	
  
set	
  highlight.	
  
‘File’:	
  
Upload	
  your	
  own	
  
evidence:	
  GFF3,	
  BAM,	
  
BigWig,	
  VCF*.	
  Add	
  
combinaBon	
  and	
  
sequence	
  search	
  
tracks.	
  
‘Tools’:	
  	
  
Use	
  BLAT	
  to	
  query	
  the	
  
genome	
  with	
  a	
  protein	
  
or	
  DNA	
  sequence.	
  
Available Tracks
Evidence	
  Tracks	
  Area	
  
‘User-­‐created	
  AnnotaBons’	
  Track	
  
Login
31
WEB APOLLO

graphical user interface (GUI) for editing annotations
Becoming Acquainted with Web Apollo.
In	
  addiBon	
  to	
  protein-­‐coding	
  gene	
  annotaBon	
  that	
  you	
  know	
  and	
  love.	
  
•  Non-­‐coding	
  genes:	
  ncRNAs,	
  miRNAs,	
  repeat	
  regions,	
  and	
  TEs	
  
•  Sequence	
  alteraBons	
  (less	
  coverage	
  =	
  more	
  fragmentaBon)	
  
•  VisualizaBon	
  of	
  stage	
  and	
  cell-­‐type	
  specific	
  transcripBon	
  data	
  as	
  
coverage	
  plots,	
  heat	
  maps,	
  and	
  alignments	
  
32	
32	
WEB APOLLO

additional functionality
Becoming Acquainted with Web Apollo.
1.  Select	
  a	
  chromosomal	
  region	
  of	
  interest,	
  e.g.	
  scaffold.	
  
2.  Select	
  appropriate	
  evidence	
  tracks.	
  
3.  Determine	
  whether	
  a	
  feature	
  in	
  an	
  exisBng	
  evidence	
  track	
  will	
  provide	
  a	
  
reasonable	
  gene	
  model	
  to	
  start	
  working.	
  
-­‐  If	
  yes:	
  select	
  and	
  drag	
  the	
  feature	
  to	
  the	
  ‘User-­‐created	
  AnnotaBons’	
  
area,	
  creaDng	
  an	
  iniDal	
  gene	
  model.	
  If	
  necessary	
  use	
  ediBng	
  funcBons	
  
to	
  adjust	
  the	
  gene	
  model.	
  
-­‐  If	
  not:	
  let’s	
  talk.	
  
4.  Check	
  your	
  edited	
  gene	
  model	
  for	
  integrity	
  and	
  accuracy	
  by	
  comparing	
  it	
  
with	
  available	
  homologs.	
  
Becoming Acquainted with Web Apollo
33 |
Always	
  remember:	
  when	
  annotaBng	
  gene	
  models	
  using	
  Web	
  Apollo,	
  
you	
  are	
  looking	
  at	
  a	
  ‘frozen’	
  version	
  of	
  the	
  genome	
  assembly	
  and	
  you	
  
will	
  not	
  be	
  able	
  to	
  modify	
  the	
  assembly	
  itself.	
  
33	
GENERAL PROCESS OF CURATION

steps to remember
Choose	
  (click	
  or	
  drag)	
  appropriate	
  evidence	
  tracks	
  from	
  the	
  list	
  on	
  the	
  le^.	
  	
  
Click	
  on	
  an	
  exon	
  to	
  select	
  it.	
  Double	
  click	
  on	
  an	
  exon	
  or	
  single	
  click	
  on	
  an	
  intron	
  to	
  select	
  the	
  enBre	
  
gene.	
  
Select	
  &	
  drag	
  any	
  elements	
  from	
  an	
  evidence	
  track	
  into	
  the	
  curaBon	
  area:	
  these	
  are	
  editable	
  and	
  
considered	
  the	
  curated	
  version	
  of	
  the	
  gene.	
  Other	
  opBons	
  for	
  elements	
  in	
  evidence	
  tracks	
  
available	
  from	
  right-­‐click	
  menu.	
  
If	
  you	
  select	
  an	
  exon	
  or	
  a	
  gene,	
  then	
  every	
  track	
  is	
  automaBcally	
  searched	
  for	
  exons	
  with	
  exactly	
  the	
  
same	
  co-­‐ordinates	
  as	
  what	
  you	
  selected.	
  Matching	
  edges	
  are	
  highlighted	
  red.	
  
Hovering	
  over	
  an	
  annotaBon	
  in	
  progress	
  brings	
  up	
  an	
  informaBon	
  pop-­‐up.	
  
34 | 34	
Becoming Acquainted with Web Apollo.
USER NAVIGATION
Right-­‐click	
  menu:	
  
•  With	
  the	
  excepBon	
  of	
  deleBng	
  a	
  model,	
  
all	
  edits	
  can	
  be	
  reversed	
  with	
  ‘Undo’	
  
opBon.	
  ‘Redo’	
  also	
  available.	
  All	
  changes	
  
are	
  immediately	
  saved	
  and	
  available	
  to	
  all	
  
users	
  in	
  real	
  Bme.	
  
•  ‘Get	
  sequence’	
  retrieves	
  pepBde,	
  cDNA,	
  
CDS,	
  and	
  genomic	
  sequences.	
  
•  You	
  can	
  select	
  an	
  exon	
  and	
  select	
  
‘Delete’.	
  You	
  can	
  create	
  an	
  intron,	
  flip	
  the	
  
direcBon,	
  change	
  the	
  start	
  or	
  split	
  the	
  
gene.	
  	
  
35 | 35	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
Right-­‐click	
  menu:	
  
•  If	
  you	
  select	
  two	
  gene	
  models,	
  you	
  can	
  
join	
  them	
  using	
  ‘Merge’,	
  and	
  you	
  may	
  also	
  
‘Split’	
  a	
  model.	
  
•  You	
  can	
  select	
  ‘Duplicate’,	
  for	
  example	
  to	
  
annotate	
  isoforms.	
  
•  Set	
  translaBon	
  start,	
  annotate	
  
selenocysteine-­‐containing	
  proteins,	
  
match	
  edges	
  of	
  annotaBon	
  to	
  those	
  of	
  
evidence	
  tracks.	
  
36 | 36	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
37	
AnnotaBons,	
  annotaBon	
  edits,	
  and	
  History:	
  stored	
  in	
  a	
  centralized	
  database.	
  
37	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
38	
The	
  AnnotaBon	
  InformaBon	
  Editor	
  
DBXRefs	
  are	
  database	
  crossed	
  references:	
  if	
  
you	
  have	
  reason	
  to	
  believe	
  that	
  this	
  gene	
  is	
  
linked	
  to	
  a	
  gene	
  in	
  a	
  public	
  database	
  (including	
  
your	
  own),	
  then	
  add	
  it	
  here.	
  
38	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
39	
The	
  AnnotaBon	
  InformaBon	
  Editor	
  
•  Add	
  PubMed	
  IDs	
  
•  Include	
  GO	
  terms	
  as	
  appropriate	
  
from	
  any	
  of	
  the	
  three	
  ontologies	
  
•  Write	
  comments	
  staBng	
  how	
  you	
  
have	
  validated	
  each	
  model.	
  
39	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
40 |
•  ‘Zoom	
  to	
  base	
  level’	
  opBon	
  reveals	
  the	
  
DNA	
  Track.	
  
•  Change	
  color	
  of	
  exons	
  by	
  CDS	
  from	
  
the	
  ‘View’	
  menu.	
  
•  The	
  reference	
  DNA	
  sequence	
  is	
  visible	
  
in	
  both	
  direcBons	
  as	
  are	
  the	
  protein	
  
translaBons	
  in	
  all	
  six	
  frames.	
  You	
  can	
  
toggle	
  either	
  direcBon	
  to	
  display	
  only	
  
3	
  frames.	
  	
  
Zoom	
  in/out	
  with	
  keyboard:	
  
shi^	
  +	
  arrow	
  keys	
  up/down	
  
40	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
Web Apollo User Guide
(Fragment)
http://genomearchitect.org/web_apollo_user_guide
In	
  a	
  “simple	
  case”	
  the	
  predicted	
  gene	
  model	
  is	
  correct	
  or	
  nearly	
  
correct,	
  and	
  this	
  model	
  is	
  supported	
  by	
  evidence	
  that	
  
completely	
  or	
  mostly	
  agrees	
  with	
  the	
  predicBon.	
  	
  
Evidence	
  that	
  extends	
  beyond	
  the	
  predicted	
  model	
  is	
  assumed	
  to	
  
be	
  non-­‐coding	
  sequence.	
  	
  
	
  
The	
  following	
  secBons	
  describe	
  simple	
  modificaBons.	
  	
  
	
  
42 | 42	
ANNOTATING SIMPLE CASES
Becoming Acquainted with Web Apollo.
Select	
  and	
  drag	
  the	
  putaBve	
  new	
  exon	
  from	
  a	
  track,	
  and	
  add	
  it	
  directly	
  to	
  an	
  
annotated	
  transcript	
  in	
  the	
  ‘User-­‐created	
  AnnotaBons’	
  area.	
  	
  
•  Click	
  the	
  exon,	
  hold	
  your	
  finger	
  on	
  the	
  mouse	
  buNon,	
  and	
  drag	
  the	
  cursor	
  
unBl	
  it	
  touches	
  the	
  receiving	
  transcript.	
  A	
  dark	
  green	
  highlight	
  indicates	
  it	
  is	
  
okay	
  to	
  release	
  the	
  mouse	
  buNon.	
  	
  
•  When	
   released,	
   the	
   addiBonal	
   exon	
   becomes	
   aNached	
   to	
   the	
   receiving	
  
transcript.	
  
43 |
•  A	
   confirmaBon	
   box	
   will	
  
warn	
   you	
   if	
   the	
   receiving	
  
transcript	
   is	
   not	
   on	
   the	
  
same	
   strand	
   as	
   the	
  
feature	
   where	
   the	
   new	
  
exon	
  originated.	
  
43	
ADDING EXONS
Becoming Acquainted with Web Apollo.
Each	
  Bme	
  you	
  add	
  an	
  exon	
  region,	
  whether	
  by	
  extension	
  or	
  adding	
  
an	
  exon,	
  Web	
  Apollo	
  recalculates	
  the	
  longest	
  ORF,	
  idenBfying	
  
‘Start’	
  and	
  ‘Stop’	
  signals	
  and	
  allowing	
  you	
  to	
  determine	
  
whether	
  a	
  ‘Stop’	
  codon	
  has	
  been	
  incorporated	
  a^er	
  each	
  
ediBng	
  step.	
  
44 |
Web	
  Apollo	
  demands	
  that	
  an	
  exon	
  already	
  exists	
  as	
  an	
  evidence	
  in	
  one	
  of	
  the	
  tracks.	
  
You	
  could	
  provide	
  a	
  text	
  file	
  in	
  GFF	
  format	
  and	
  select	
  File	
  à	
  Open.	
  GFF	
  is	
  a	
  simple	
  text	
  
file	
  delimited	
  by	
  TABs,	
  one	
  line	
  for	
  each	
  genomic	
  ‘feature’:	
  column	
  1	
  is	
  the	
  name	
  of	
  
the	
  scaffold;	
  then	
  some	
  text	
  (irrelevant),	
  then	
  ‘exon’,	
  then	
  start,	
  stop,	
  strand	
  as	
  +	
  or	
  -­‐,	
  
a	
  dot,	
  another	
  dot,	
  and	
  Name=some	
  name	
  
Example:	
  
scaffold_88 	
  Qratore	
  exon	
  21 	
  2111	
  + 	
  . 	
  . 	
  Name=bob	
  
scaffold_88 	
  Qratore	
  exon	
  2201	
  5111	
  + 	
  . 	
  . 	
  Name=rad	
  
44	
ADDING EXONS
Becoming Acquainted with Web Apollo.
Gene	
  predicBons	
  may	
  or	
  may	
  not	
  include	
  UTRs.	
  If	
  
transcript	
  alignment	
  data	
  are	
  available	
  and	
  
extend	
  beyond	
  your	
  original	
  annotaBon,	
  you	
  
may	
  extend	
  or	
  add	
  UTRs.	
  	
  
1.  PosiBon	
  the	
  cursor	
  at	
  the	
  beginning	
  of	
  the	
  
exon	
  that	
  needs	
  to	
  be	
  extended	
  and	
  ‘Zoom	
  
to	
  base	
  level’.	
  	
  
2.  Place	
  the	
  cursor	
  over	
  the	
  edge	
  of	
  the	
  exon	
  
unBl	
  it	
  becomes	
  a	
  black	
  arrow	
  then	
  click	
  and	
  
drag	
  the	
  edge	
  of	
  the	
  exon	
  to	
  the	
  new	
  
coordinate	
  posiBon	
  that	
  includes	
  the	
  UTR.	
  	
  
45 |
View	
  zoomed	
  to	
  base	
  level.	
  The	
  DNA	
  track	
  
and	
  annotaBon	
  track	
  are	
  visible.	
  The	
  DNA	
  
track	
  includes	
  the	
  sense	
  strand	
  (top)	
  and	
  
anB-­‐sense	
   strand	
   (boNom).	
   The	
   six	
  
reading	
  frames	
  flank	
  the	
  DNA	
  track,	
  with	
  
the	
   three	
   forward	
   frames	
   above	
   and	
   the	
  
three	
   reverse	
   frames	
   below.	
   The	
   User-­‐
created	
   AnnotaBon	
   track	
   shows	
   the	
  
terminal	
  end	
  of	
  an	
  annotaBon.	
  The	
  green	
  
rectangle	
   highlights	
   the	
   locaBon	
   of	
   the	
  
nucleoBde	
  residues	
  in	
  the	
  ‘Stop’	
  signal.	
  
To	
  add	
  a	
  new	
  spliced	
  UTR	
  to	
  an	
  exisBng	
  
annotaBon	
  follow	
  the	
  procedure	
  for	
  adding	
  
an	
  exon.	
  
45	
ADDING UTRs
Becoming Acquainted with Web Apollo.
1.  Zoom	
  in	
  sufficiently	
  to	
  clearly	
  resolve	
  each	
  exon	
  as	
  a	
  disBnct	
  
rectangle.	
  	
  
2.  Two	
  exons	
  from	
  different	
  tracks	
  sharing	
  the	
  same	
  start	
  and/or	
  end	
  
coordinates	
  will	
  display	
  a	
  red	
  bar	
  to	
  indicate	
  the	
  matching	
  edges.	
  
3.  SelecBng	
  the	
  whole	
  annotaBon	
  or	
  one	
  exon	
  at	
  a	
  Bme,	
  use	
  this	
  ‘edge-­‐
matching’	
  funcBon	
  and	
  scroll	
  along	
  the	
  length	
  of	
  the	
  annotaBon,	
  
verifying	
  exon	
  boundaries	
  against	
  available	
  data.	
  Use	
  square	
  [	
  ]	
  
brackets	
  to	
  scroll	
  from	
  exon	
  to	
  exon.	
  
4.  Note	
  if	
  there	
  are	
  cDNA	
  /	
  RNAseq	
  reads	
  that	
  lack	
  one	
  or	
  more	
  of	
  the	
  
annotated	
  exons	
  or	
  include	
  addiBonal	
  exons.	
  	
  
	
  
46 | 46	
EXON STRUCTURE INTEGRITY
Becoming Acquainted with Web Apollo.
To	
  modify	
  an	
  exon	
  boundary	
  and	
  match	
  
data	
   in	
   the	
   evidence	
   tracks:	
   select	
  
both	
   the	
   offending	
   exon	
   and	
   the	
  
feature	
  with	
  the	
  expected	
  boundary,	
  
then	
  right	
  click	
  on	
  the	
  annotaBon	
  to	
  
select	
  ‘Set	
  3’	
  end’	
  or	
  ‘Set	
  5’	
  end’	
  as	
  
appropriate.	
  
	
  
47 |
In	
  some	
  cases	
  all	
  the	
  data	
  may	
  disagree	
  with	
  the	
  annotaBon,	
  in	
  other	
  cases	
  
some	
  data	
  support	
  the	
  annotaBon	
  and	
  some	
  of	
  the	
  data	
  support	
  one	
  or	
  
more	
  alternaBve	
  transcripts.	
  Try	
  to	
  annotate	
  as	
  many	
  alternaBve	
  transcripts	
  
as	
  are	
  well	
  supported	
  by	
  the	
  data.	
  
47	
EXON STRUCTURE INTEGRITY
Becoming Acquainted with Web Apollo.
Flags	
  non-­‐canonical	
  
splice	
  sites.	
  
SelecBon	
  of	
  features	
  and	
  sub-­‐
features	
  
Edge-­‐matching	
  
Evidence	
  Tracks	
  Area	
  
‘User-­‐created	
  AnnotaBons’	
  Track	
  
The	
  ediBng	
  logic	
  in	
  the	
  server:	
  	
  
§  selects	
  longest	
  ORF	
  as	
  CDS	
  
§  flags	
  non-­‐canonical	
  splice	
  sites	
  
48	
EDITING LOGIC
Becoming Acquainted with Web Apollo.
Zoom	
  to	
  base	
  level	
  to	
  review	
  non-­‐
canonical	
  splice	
  site	
  warnings.	
  
These	
  do	
  not	
  necessarily	
  need	
  to	
  
be	
  corrected,	
  but	
  should	
  be	
  
flagged	
  with	
  the	
  appropriate	
  
comment.	
  	
  
	
  
49 |
Exon/intron	
  juncBon	
  possible	
  error	
  
Original	
  model	
  
Curated	
  model	
  
Non-­‐canonical	
   splices	
   are	
   indicated	
   by	
   an	
  
orange	
   circle	
   with	
   a	
   white	
   exclamaBon	
   point	
  
inside,	
   placed	
   over	
   the	
   edge	
   of	
   the	
   offending	
  
exon.	
  	
  
Most	
   insects,	
   have	
   a	
   valid	
   non-­‐canonical	
   site	
  
GC-­‐AG.	
   Other	
   non-­‐canonical	
   splice	
   sites	
   are	
  
unverified.	
  Web	
  Apollo	
  flags	
  GC	
  splice	
  donors	
  
as	
  non-­‐canonical.	
  
Canonical	
  splice	
  sites:	
  
3’-­‐…exon]GA	
  /	
  TG[exon…-­‐5’	
  
5’-­‐…exon]GT	
  /	
  AG[exon…-­‐3’	
  
reverse	
  strand,	
  not	
  reverse-­‐complemented:	
  
forward	
  strand	
  
49	
SPLICE SITES
Becoming Acquainted with Web Apollo.
Some	
   gene	
   predicBon	
   algorithms	
   do	
   not	
   recognize	
  
GC	
   splice	
   sites,	
   thus	
   the	
   intron/exon	
   juncBon	
  
may	
   be	
   incorrect.	
   For	
   example,	
   one	
   such	
   gene	
  
predicBon	
  algorithm	
  may	
  ignore	
  a	
  true	
  GC	
  donor	
  
and	
  select	
  another	
  non-­‐canonical	
  splice	
  site	
  that	
  
is	
  less	
  frequently	
  observed	
  in	
  nature.	
  	
  
Therefore,	
   if	
   upon	
   inspecBon	
   you	
   find	
   a	
   non-­‐
canonical	
   splice	
   site	
   that	
   is	
   rarely	
   observed	
   in	
  
nature,	
  you	
  may	
  wish	
  to	
  search	
  the	
  region	
  for	
  a	
  
more	
   frequent	
   in-­‐frame	
   non-­‐canonical	
   splice	
  
site,	
  such	
  as	
  a	
  GC	
  donor.	
  If	
  there	
  is	
  an	
  in-­‐frame	
  
site	
   close	
   that	
   is	
   more	
   likely	
   to	
   be	
   the	
   correct	
  
splice	
   donor,	
   you	
   may	
   make	
   this	
   adjustment	
  
while	
  zoomed	
  at	
  base	
  level.	
  	
  
	
  
50 |
Exon/intron junction possible error
Original model
Curated model
Use	
  RNA-­‐Seq	
  data	
  to	
  
make	
  a	
  decision.	
  
Canonical	
  splice	
  sites:	
  
3’-­‐…exon]GA	
  /	
  TG[exon…-­‐5’	
  
5’-­‐…exon]GT	
  /	
  AG[exon…-­‐3’	
  
reverse	
  strand,	
  not	
  reverse-­‐complemented:	
  
forward	
  strand	
  
50	
SPLICE SITES
keep this in mind
Becoming Acquainted with Web Apollo.
Web	
  Apollo	
  calculates	
  the	
  longest	
  possible	
  open	
  
reading	
  frame	
  (ORF)	
  that	
  includes	
  canonical	
  
‘Start’	
  and	
  ‘Stop’	
  signals	
  within	
  the	
  predicted	
  
exons.	
  	
  
If	
  it	
  appears	
  to	
  have	
  calculated	
  an	
  incorrect	
  ‘Start’	
  
signal,	
  you	
  may	
  modify	
  it	
  selecBng	
  an	
  in-­‐frame	
  
‘Start’	
  codon	
  further	
  up	
  or	
  downstream,	
  
depending	
  on	
  evidence	
  (protein	
  database,	
  
addiBonal	
  evidence	
  tracks).	
  An	
  upstream	
  ‘Start’	
  
codon	
  may	
  be	
  present	
  outside	
  the	
  predicted	
  
gene	
  model,	
  within	
  a	
  region	
  supported	
  by	
  
another	
  evidence	
  track.	
  	
  
51 | 51	
‘START’ AND ‘STOP’ SITES
Becoming Acquainted with Web Apollo.
Note	
  that	
  the	
  ‘Start’	
  codon	
  may	
  also	
  be	
  located	
  in	
  a	
  non-­‐predicted	
  exon	
  further	
  
upstream.	
  If	
  you	
  cannot	
  idenBfy	
  that	
  exon,	
  add	
  the	
  appropriate	
  note	
  in	
  the	
  
transcript’s	
  ‘Comments’	
  secBon.	
  
In	
  very	
  rare	
  cases,	
  the	
  actual	
  ‘Start’	
  codon	
  may	
  be	
  non-­‐canonical	
  (non-­‐ATG).	
  	
  
In	
  some	
  cases,	
  a	
  ‘Stop’	
  codon	
  may	
  not	
  be	
  automaBcally	
  idenBfied.	
  Check	
  to	
  see	
  
if	
  there	
  are	
  data	
  supporBng	
  a	
  3’	
  extension	
  of	
  the	
  terminal	
  exon	
  or	
  addiBonal	
  
3’	
  exons	
  with	
  valid	
  splice	
  sites.	
  	
  
52 | 52	
‘START’ AND ‘STOP’ SITES
keep this in mind
Becoming Acquainted with Web Apollo.
Evidence	
  may	
  support	
  joining	
  two	
  or	
  more	
  different	
  gene	
  models.	
  Warning:	
  protein	
  
alignments	
  may	
  have	
  incorrect	
  splice	
  sites	
  and	
  lack	
  non-­‐conserved	
  regions!	
  
1.  Drag	
  and	
  drop	
  each	
  gene	
  model	
  to	
  ‘User-­‐created	
  AnnotaBons’	
  area.	
  Shi^	
  click	
  to	
  
select	
  an	
  intron	
  from	
  each	
  gene	
  model	
  and	
  right	
  click	
  to	
  select	
  the	
  ‘Merge’	
  opBon	
  
from	
  the	
  menu.	
  	
  
2.  Drag	
  supporBng	
  evidence	
  tracks	
  over	
  the	
  candidate	
  models	
  to	
  corroborate	
  overlap,	
  or	
  
review	
  edge	
  matching	
  and	
  coverage	
  across	
  models.	
  
3.  Check	
  the	
  resulBng	
  translaBon	
  by	
  querying	
  a	
  protein	
  database	
  e.g.	
  UniProt.	
  Record	
  
the	
  IDs	
  of	
  both	
  starBng	
  gene	
  models	
  in	
  ‘DBXref’	
  and	
  add	
  comments	
  to	
  record	
  that	
  this	
  
annotaBon	
  is	
  the	
  result	
  of	
  a	
  merge.	
  
54 | 54	
Red	
  lines	
  around	
  exons:	
  
‘edge-­‐matching’	
  allows	
  annotators	
  to	
  confirm	
  whether	
  the	
  
evidence	
  is	
  in	
  agreement	
  without	
  examining	
  each	
  exon	
  at	
  the	
  
base	
  level.	
  
COMPLEX CASES
merge two gene predictions on the same scaffold
Becoming Acquainted with Web Apollo.
One	
  or	
  more	
  splits	
  may	
  be	
  recommended	
  when	
  different	
  
segments	
  of	
  the	
  predicted	
  protein	
  align	
  to	
  two	
  or	
  more	
  
different	
  families	
  of	
  protein	
  homologs,	
  and	
  the	
  predicted	
  
protein	
  does	
  not	
  align	
  to	
  any	
  known	
  protein	
  over	
  its	
  enBre	
  
length.	
  Transcript	
  data	
  may	
  support	
  a	
  split	
  (if	
  so,	
  verify	
  that	
  it	
  is	
  
not	
  a	
  case	
  of	
  alternaBve	
  transcripts).	
  	
  
55 | 55	
COMPLEX CASES
split a gene prediction
Becoming Acquainted with Web Apollo.
DNA	
  Track	
  
‘User-­‐created	
  AnnotaDons’	
  Track	
  
56	
COMPLEX CASES
frameshifts, single-base errors, and selenocysteines
Becoming Acquainted with Web Apollo.
1.  Web	
  Apollo	
  allows	
  annotators	
  to	
  make	
  single	
  base	
  modificaBons	
  or	
  frameshi^s	
  that	
  are	
  
reflected	
  in	
  the	
  sequence	
  and	
  structure	
  of	
  any	
  transcripts	
  overlapping	
  the	
  modificaBon.	
  Note	
  
that	
  these	
  manipulaBons	
  do	
  NOT	
  change	
  the	
  underlying	
  genomic	
  sequence.	
  	
  
2.  If	
  you	
  determine	
  that	
  you	
  need	
  to	
  make	
  one	
  of	
  these	
  changes,	
  zoom	
  in	
  to	
  the	
  nucleoBde	
  level	
  
and	
  right	
  click	
  over	
  a	
  single	
  nucleoBde	
  on	
  the	
  genomic	
  sequence	
  to	
  access	
  a	
  menu	
  that	
  
provides	
  opBons	
  for	
  creaBng	
  inserBons,	
  deleBons	
  or	
  subsBtuBons.	
  	
  
3.  The	
  ‘Create	
  Genomic	
  InserBon’	
  feature	
  will	
  require	
  you	
  to	
  enter	
  the	
  necessary	
  string	
  of	
  
nucleoBde	
  residues	
  that	
  will	
  be	
  inserted	
  to	
  the	
  right	
  of	
  the	
  cursor’s	
  current	
  locaBon.	
  The	
  
‘Create	
  Genomic	
  DeleBon’	
  opBon	
  will	
  require	
  you	
  to	
  enter	
  the	
  length	
  of	
  the	
  deleBon,	
  starBng	
  
with	
  the	
  nucleoBde	
  where	
  the	
  cursor	
  is	
  posiBoned.	
  The	
  ‘Create	
  Genomic	
  SubsBtuBon’	
  feature	
  
asks	
  for	
  the	
  string	
  of	
  nucleoBde	
  residues	
  that	
  will	
  replace	
  the	
  ones	
  on	
  the	
  DNA	
  track.	
  
4.  Once	
  you	
  have	
  entered	
  the	
  modificaBons,	
  Web	
  Apollo	
  will	
  recalculate	
  the	
  corrected	
  transcript	
  
and	
  protein	
  sequences,	
  which	
  will	
  appear	
  when	
  you	
  use	
  the	
  right-­‐click	
  menu	
  ‘Get	
  Sequence’	
  
opBon.	
  Since	
  the	
  underlying	
  genomic	
  sequence	
  is	
  reflected	
  in	
  all	
  annotaBons	
  that	
  include	
  the	
  
modified	
  region	
  you	
  should	
  alert	
  the	
  curators	
  of	
  your	
  organisms	
  database	
  using	
  the	
  
‘Comments’	
  secBon	
  to	
  report	
  the	
  CDS	
  edits.	
  	
  
5.  In	
  special	
  cases	
  such	
  as	
  selenocysteine	
  containing	
  proteins	
  (read-­‐throughs),	
  right-­‐click	
  over	
  the	
  
offending/premature	
  ‘Stop’	
  signal	
  and	
  choose	
  the	
  ‘Set	
  readthrough	
  stop	
  codon’	
  opBon	
  from	
  
the	
  menu.	
  
	
  57 | 57	
COMPLEX CASES
frameshifts, single-base errors, and selenocysteines
Becoming Acquainted with Web Apollo.
Follow	
  our	
  checklist	
  unBl	
  you	
  are	
  happy	
  with	
  the	
  annotaBon!	
  
Then:	
  
–  Comment	
  to	
  validate	
  your	
  annotaBon,	
  even	
  if	
  you	
  made	
  no	
  
changes	
  to	
  an	
  exisBng	
  model.	
  Your	
  comments	
  mean	
  you	
  looked	
  
at	
  the	
  curated	
  model	
  and	
  are	
  happy	
  with	
  it;	
  think	
  of	
  it	
  as	
  a	
  vote	
  
of	
  confidence.	
  
–  Or	
  add	
  a	
  comment	
  to	
  inform	
  the	
  community	
  of	
  unresolved	
  
issues	
  you	
  think	
  this	
  model	
  may	
  have.	
  
58 | 58	
Always	
  Remember:	
  Web	
  Apollo	
  curaBon	
  is	
  a	
  community	
  effort	
  so	
  
please	
  use	
  comments	
  to	
  communicate	
  the	
  reasons	
  for	
  your	
  	
  
annotaBon	
  (your	
  comments	
  will	
  be	
  visible	
  to	
  everyone).	
  
COMPLETING THE ANNOTATION
Becoming Acquainted with Web Apollo.
To	
  find	
  the	
  gene	
  region	
  you	
  wish	
  to	
  annotate,	
  you	
  may	
  use:	
  
a)  a	
  protein	
  sequence	
  of	
  a	
  homolog	
  from	
  another	
  species	
  
b)  a	
  sequence	
  from	
  a	
  similar	
  gene	
  in	
  species	
  of	
  interest	
  (e.g.	
  another	
  gene	
  family	
  member)	
  
c)  on	
  your	
  own,	
  you	
  aligned	
  your	
  gene	
  models	
  or	
  transcriptomic	
  data	
  to	
  the	
  genome.	
  
d)  you	
  used	
  high	
  quality	
  proteins	
  and/or	
  gene	
  family	
  alignments	
  (mulB	
  or	
  single	
  species)	
  
and	
  are	
  able	
  to	
  idenBfy	
  conserved	
  domains.	
  
OpDon	
  1	
  –	
  You	
  have	
  a	
  sequence	
  but	
  don’t	
  know	
  where	
  it	
  is	
  in	
  this	
  genome:	
  
•  Use	
  BLAT	
  in	
  the	
  Apollo	
  window,	
  or	
  BLAST	
  at	
  NAL’s	
  i5k	
  BLAST	
  server,	
  available	
  at:	
  hNp://i5k.nal.usda.gov/blastn	
  	
  	
  
•  You	
  may	
  also	
  use	
  other	
  tools	
  for	
  annotaBon	
  and	
  contribute	
  your	
  data	
  from	
  those	
  efforts.	
  
OpDon	
  2	
  –	
  The	
  genome	
  has	
  already	
  been	
  annotated	
  with	
  your	
  sequences	
  and	
  you	
  have	
  a	
  gene	
  
idenBfier	
  that	
  has	
  been	
  indexed	
  in	
  Apollo.	
  	
  
•  That	
  is,	
  you	
  know	
  where	
  to	
  look,	
  so	
  type	
  the	
  ID	
  in	
  the	
  Search	
  box	
  of	
  Apollo.	
  
•  Apollo	
  autocompletes	
  using	
  a	
  case-­‐insensiBve	
  search	
  anchored	
  on	
  the	
  le^-­‐hand	
  side	
  of	
  the	
  word.	
  For	
  
example	
  “HaGR”	
  will	
  show	
  all	
  “hagr”	
  objects	
  (up	
  to	
  30).	
  
•  Choose	
  one	
  of	
  the	
  genes	
  and	
  click	
  “Go”.	
  
•  You	
  can	
  do	
  that	
  with	
  Domains,	
  Alignments	
  or	
  Gene	
  names	
  provided	
  to	
  you	
  (if	
  they	
  have	
  been	
  indexed).	
  
OpDon	
  3	
  –	
  Find	
  genes	
  based	
  on	
  funcBonal	
  ontology	
  terms	
  or	
  network	
  membership	
  idenBfiers.	
  
Becoming Acquainted with Web Apollo.
HOW TO BEGIN
1.  Select	
  the	
  chromosomal	
  region	
  of	
  interest,	
  e.g.	
  scaffold.	
  
2.  Select	
  appropriate	
  evidence	
  tracks.	
  
3.  Determine	
  whether	
  a	
  feature	
  in	
  an	
  exisBng	
  evidence	
  track	
  will	
  provide	
  a	
  
reasonable	
  gene	
  model	
  to	
  start	
  working.	
  
-­‐  If	
  yes:	
  select	
  and	
  drag	
  the	
  feature	
  to	
  the	
  ‘User-­‐created	
  AnnotaBons’	
  
area,	
  creaDng	
  an	
  iniDal	
  gene	
  model.	
  If	
  necessary	
  use	
  ediBng	
  funcBons	
  
to	
  adjust	
  the	
  gene	
  model.	
  
-­‐  Nothing	
  available	
  to	
  you?	
  Let’s	
  have	
  a	
  talk.	
  
4.  Check	
  your	
  edited	
  gene	
  model	
  for	
  integrity	
  and	
  accuracy	
  by	
  comparing	
  it	
  
with	
  available	
  homologs.	
  
60 |
Always	
  remember:	
  when	
  annotaBng	
  gene	
  models	
  using	
  Apollo,	
  you	
  
are	
  looking	
  at	
  a	
  ‘frozen’	
  version	
  of	
  the	
  genome	
  assembly	
  and	
  you	
  will	
  
not	
  be	
  able	
  to	
  modify	
  the	
  assembly	
  itself.	
  
60	
Becoming Acquainted with Web Apollo.
GENERAL PROCESS OF CURATION
61CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

pay attention to these details
v  AnnotaDng	
  a	
  simple	
  case:	
  WHEN	
  “The	
  official	
  predicBon	
  is	
  correct,	
  or	
  nearly	
  
correct,	
  assuming	
  that	
  no	
  aligned	
  data	
  extends	
  beyond	
  the	
  gene	
  model	
  and	
  
if	
  so,	
  it	
  is	
  not	
  likely	
  to	
  be	
  coding	
  sequence,	
  and/or	
  the	
  gene	
  predicBon	
  
matches	
  what	
  you	
  know	
  about	
  the	
  gene”:	
  
a.  Can	
  you	
  add	
  UTRs?	
  	
  
b.  Check	
  exon	
  structures.	
  
c.  Check	
  splice	
  sites:	
  …]5’-­‐GT/AG-­‐3’[…	
  
d.  Check	
  ‘start’	
  and	
  ‘stop’	
  sites.	
  
e.  Check	
  the	
  predicted	
  protein	
  product(s).	
  
f.  If	
  the	
  protein	
  product	
  sBll	
  does	
  not	
  look	
  correct,	
  go	
  on	
  to	
  “AnnotaBng	
  
more	
  complex	
  cases”.	
  	
  
62CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

continued
v  AddiDonal	
  funcDonality.	
  You	
  may	
  also	
  need	
  to	
  learn	
  how	
  to:	
  
a.  Get	
  genomic	
  sequence	
  	
  
b.  Merge	
  exons	
  	
  
c.  Add/Delete	
  an	
  exon	
  	
  
d.  Create	
  an	
  exon	
  de	
  novo	
  (within	
  an	
  intron	
  or	
  outside	
  exisBng	
  
annotaBons).	
  
e.  Right/apple-­‐click	
  on	
  a	
  feature	
  to	
  get	
  feature	
  ID	
  and	
  addiBonal	
  
informaBon	
  	
  
f.  Looking	
  up	
  homolog	
  descripBons	
  going	
  to	
  the	
  accession	
  web	
  page	
  at	
  
UniProt/Swissprot	
  	
  
63CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

continued
v  AnnotaDng	
  more	
  complex	
  cases:	
  	
  
a.  Incomplete	
  annotaBon:	
  protein	
  integrity	
  checks,	
  indicate	
  gaps,	
  missing	
  5’	
  
sequences	
  or	
  missing	
  3’	
  sequences.	
  	
  
b.  Merge	
  of	
  2	
  gene	
  predicBons	
  on	
  same	
  scaffold	
  	
  
c.  Merge	
  of	
  2	
  gene	
  predicBons	
  on	
  different	
  scaffolds	
  (uh-­‐oh!).	
  
d.  Split	
  of	
  a	
  gene	
  predicBon	
  	
  
e.  Frameshi^s,	
  Selenocysteine,	
  single-­‐base	
  errors,	
  and	
  other	
  inconvenient	
  
phenomena	
  	
  
64CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

continued
v  Adding	
  important	
  project	
  informaDon	
  in	
  the	
  form	
  of	
  Canned	
  and/or	
  
Customized	
  Comments:	
  
a.  NCBI	
  ID,	
  RefSeq	
  ID,	
  gene	
  symbol(s),	
  common	
  name(s),	
  synonyms,	
  top	
  
BLAST	
  hits	
  (GenBank	
  IDs),	
  orthologs	
  with	
  species	
  names,	
  and	
  anything	
  
else	
  you	
  can	
  think	
  of,	
  because	
  you	
  are	
  the	
  expert.	
  
b.  Type	
  of	
  annotaBon	
  (e.g.:	
  whether	
  or	
  not	
  the	
  gene	
  model	
  was	
  changed)	
  	
  
c.  Data	
  source	
  (for	
  example	
  if	
  the	
  Fgeneshpp	
  predicted	
  gene	
  was	
  the	
  
starBng	
  point	
  for	
  your	
  annotaBon)	
  
d.  The	
  kinds	
  of	
  changes	
  you	
  made	
  to	
  the	
  gene	
  model,	
  e.g.:	
  split,	
  merge	
  
e.  FuncBonal	
  descripBon	
  
f.  Whether	
  you	
  would	
  like	
  for	
  your	
  MOD	
  curator	
  to	
  check	
  the	
  annotaBon	
  
g.  Whether	
  part	
  of	
  your	
  gene	
  is	
  on	
  a	
  different	
  scaffold.	
  
1.  Can	
  you	
  add	
  UTRs	
  (e.g.:	
  via	
  RNA-­‐Seq)?	
  
2.  Check	
  exon	
  structures	
  
3.  Check	
  splice	
  sites:	
  most	
  splice	
  sites	
  display	
  these	
  
residues	
  …]5’-­‐GT/AG-­‐3’[…	
  
4.  Check	
  ‘Start’	
  and	
  ‘Stop’	
  sites	
  
5.  Check	
  the	
  predicted	
  protein	
  product(s)	
  
–  Align	
  it	
  against	
  relevant	
  genes/gene	
  family.	
  
–  blastp	
  against	
  NCBI’s	
  RefSeq	
  or	
  nr	
  
6.  If	
  the	
  protein	
  product	
  sBll	
  does	
  not	
  look	
  correct	
  
then	
  check:	
  
–  Are	
  there	
  gaps	
  in	
  the	
  genome?	
  
–  Merge	
  of	
  2	
  gene	
  predicBons	
  on	
  the	
  same	
  
scaffold	
  
–  Merge	
  of	
  2	
  gene	
  predicBons	
  from	
  different	
  
scaffolds	
  	
  
–  Split	
  a	
  gene	
  predicBon	
  
–  Frameshi^s	
  	
  
–  error	
  in	
  the	
  genome	
  assembly?	
  
–  Selenocysteine,	
  single-­‐base	
  errors,	
  and	
  other	
  
inconvenient	
  phenomena	
  
65 | 65	
7.  Finalize	
  annotaBon	
  by	
  adding:	
  
–  Important	
  project	
  informaBon	
  in	
  the	
  form	
  of	
  
canned	
  and/or	
  customized	
  comments	
  
–  IDs	
  from	
  GenBank	
  (via	
  DBXRef),	
  gene	
  symbol(s),	
  
common	
  name(s),	
  synonyms,	
  top	
  BLAST	
  hits	
  (with	
  
GenBank	
  IDs),	
  orthologs	
  with	
  species	
  names,	
  and	
  
everything	
  else	
  you	
  can	
  think	
  of,	
  because	
  you	
  are	
  
the	
  expert.	
  
–  Whether	
  your	
  model	
  replaces	
  one	
  or	
  more	
  models	
  
from	
  the	
  official	
  gene	
  set	
  (so	
  it	
  can	
  be	
  deleted).	
  
–  The	
  kinds	
  of	
  changes	
  you	
  made	
  to	
  the	
  gene	
  model	
  
of	
  interest,	
  if	
  any.	
  E.g.:	
  splits,	
  merges,	
  whether	
  the	
  
5’	
  or	
  3’	
  ends	
  had	
  to	
  be	
  modified	
  to	
  include	
  ‘Start’	
  or	
  
‘Stop’	
  codons,	
  addiBonal	
  exons	
  had	
  to	
  be	
  added,	
  or	
  
non-­‐canonical	
  splice	
  sites	
  were	
  accepted.	
  
–  Any	
  funcBonal	
  assignments	
  that	
  you	
  think	
  are	
  of	
  
interest	
  to	
  the	
  community	
  (e.g.	
  via	
  BLAST,	
  RNA-­‐Seq	
  
data,	
  literature	
  searches,	
  etc.)	
  
THE CHECK LIST
for accuracy and integrity
Becoming Acquainted with Web Apollo.
Example	
  
Apollo Example
-­‐	
  Introductory	
  demonstraBon	
  using	
  the	
  Hyalella	
  azteca	
  genome	
  
(amphipod	
  crustacean).	
  
Example 67
A	
  public	
  Apollo	
  Demo	
  using	
  the	
  Honey	
  Bee	
  genome	
  is	
  available	
  at	
  	
  
hNp://genomearchitect.org/WebApolloDemo	
  
What do we know about this genome?
•  Currently	
  publicly	
  available	
  data	
  at	
  NCBI:	
  
•  >37,000	
   	
  nucleoBde	
  seqsà	
  scaffolds,	
  mitochondrial	
  genes	
  
•  300	
   	
  amino	
  acid	
  seqsà	
  mitochondrion	
  
•  53 	
   	
  ESTs	
  
•  0	
   	
   	
  conserved	
  domains	
  idenBfied	
  
•  0 	
   	
  “gene”	
  entries	
  submiNed	
  
	
  
•  Data	
  at	
  i5K	
  Workspace@NAL	
  
-­‐	
  10,832	
  scaffolds,	
  23,288	
  transcripts,	
  12,906	
  proteins	
  
Example 68
PubMed Search: 

what’s new?
Example 69
PubMed Search: what’s new?
Example 70
“Ten	
  populaBons	
  (3	
  laboratory	
  cultures,	
  7	
  California	
  
water	
  bodies)	
  differed	
  by	
  at	
  least	
  550-­‐fold	
  in	
  sensiBvity	
  
to	
  pyrethroids.”	
  	
  
“By	
  sequencing	
  the	
  primary	
  pyrethroid	
  target	
  site,	
  the	
  
voltage-­‐gated	
  sodium	
  channel	
  (vgsc),	
  we	
  show	
  that	
  
point	
  mutaBons	
  and	
  their	
  spread	
  in	
  natural	
  populaBons	
  
were	
  responsible	
  for	
  differences	
  in	
  pyrethroid	
  
sensiBvity.”	
  
“The	
  finding	
  that	
  a	
  non-­‐target	
  aquaBc	
  species	
  has	
  
acquired	
  resistance	
  to	
  pesBcides	
  used	
  only	
  on	
  terrestrial	
  
pests	
  is	
  troubling	
  evidence	
  of	
  the	
  impact	
  of	
  chronic	
  
pesBcide	
  transport	
  from	
  land-­‐based	
  applicaBons	
  into	
  
aquaBc	
  systems.”	
  
How many sequences for our gene of
interest?
Example 71
•  Para,	
  (voltage-­‐gated	
  sodium	
  channel	
  alpha	
  
subunit;	
  Nasonia	
  vitripennis).	
  	
  
•  NaCP60E	
  (Sodium	
  channel	
  protein	
  60	
  E;	
  D.	
  
melanogaster).	
  
•  MF:	
  voltage-­‐gated	
  caBon	
  channel	
  acBvity	
  
(IDA,	
  GO:0022843).	
  
•  BP:	
  olfactory	
  behavior	
  (IMP,	
  GO:0042048),	
  
sodium	
  ion	
  transmembrane	
  transport	
  
(ISS,GO:0035725).	
  
•  CC:	
  voltage-­‐gated	
  sodium	
  channel	
  complex	
  
(IEA,	
  GO:0001518).	
  
And	
  what	
  do	
  we	
  know	
  about	
  them?	
  
BLAST at i5K 

https://i5k.nal.usda.gov/blast
Example 72
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAST at i5K 

https://i5k.nal.usda.gov/blast
Example 73
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAST at i5K 

https://i5k.nal.usda.gov/blast
Example 74
BLAST at i5K: 

high-scoring segment pairs (hsp) in “BLAST+ Results” track
Example 75
Available Tracks
Example 76
Creating a new gene model: drag and drop
Example 77
•  Web Apollo automatically calculates the longest open reading
frame (ORF). In this case, the ORF includes the high-scoring
segment pairs (hsp).
Get Sequence
Example 78
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Flanking sequences (other gene models) vs. NCBI nr
Example 79
In	
  this	
  case,	
  two	
  gene	
  
models	
  at	
  5’	
  end.	
  
Review alignments
Example 80
HaztTmpM006232	
  
HaztTmpM006233	
  
HaztTmpM006234	
  
Hypothesis for vgsc gene model
Example 81
Editing: merge
Example 82
Merge	
  by	
  dropping	
  an	
  
exon	
  or	
  gene	
  model	
  
onto	
  another.	
  
Merge	
  by	
  selecBng	
  
two	
  exons	
  (holding	
  
down	
  “Shi^”)	
  and	
  
using	
  the	
  right	
  click	
  
menu.	
  
Editing: correct boundaries, delete exons
Example 83
Modify	
  exon	
  /	
  intron	
  
boundary	
  by	
  dragging	
  
the	
  end	
  of	
  the	
  exon	
  to	
  
the	
  nearest	
  canonical	
  
splice	
  site.	
  
Delete	
  first	
  exon	
  from	
  
M006233	
  
Editing: set translation start, modify boundary
Example 84
Set	
  translaBon	
  start	
  
Modify	
  intron	
  /	
  
exon	
  boundary	
  
(here	
  and	
  at	
  
coord.	
  78,999)	
  
Finished model
Example 85
Corroborate	
  integrity	
  and	
  accuracy	
  of	
  the	
  model:	
  	
  
-­‐	
  Start	
  and	
  Stop	
  
-­‐	
  Exon	
  structure	
  and	
  splice	
  sites	
  …]5’-­‐GT/AG-­‐3’[…	
  
-­‐	
  Check	
  the	
  predicted	
  protein	
  product	
  on	
  NCBI	
  nr	
  
Information Editor
•  DBXRefs:	
  e.g.	
  NP_001128389.1,	
  N.	
  
vitripennis,	
  RefSeq	
  
•  PubMed	
  idenBfier:	
  PMID:	
  24065824	
  
•  Gene	
  Ontology	
  IDs:	
  GO:0022843,	
  GO:
0042048,	
  GO:0035725,	
  GO:0001518.	
  
•  Comments.	
  
•  Name,	
  Symbol.	
  	
  
•  Approve	
  /	
  Delete	
  radio	
  buNon.	
  
Example 86
Comments	
  
(if	
  applicable)	
  
Demo	
  
APOLLO

demonstration
DEMO 88
See	
  Apollo	
  DemonstraBon	
  Video	
  at:	
  
hNps://youtu.be/VgPtAP_fvxY	
  	
  	
  
Exercises
Live	
  DemonstraBon	
  using	
  the	
  Apis	
  mellifera	
  genome.	
  
89
1.	
  Evidence	
  in	
  support	
  of	
  protein	
  coding	
  gene	
  
models.	
  
	
  	
  
1.1	
  Consensus	
  Gene	
  Sets:	
  
Official	
  Gene	
  Set	
  v3.2	
  
Official	
  Gene	
  Set	
  v1.0	
  
	
  
1.2	
  Consensus	
  Gene	
  Sets	
  comparison:	
  
OGSv3.2	
  genes	
  that	
  merge	
  OGSv1.0	
  and	
  
RefSeq	
  genes	
  
OGSv3.2	
  genes	
  that	
  split	
  OGSv1.0	
  and	
  RefSeq	
  
genes	
  
	
  
1.3	
  Protein	
  Coding	
  Gene	
  PredicDons	
  Supported	
  by	
  
Biological	
  Evidence:	
  
NCBI	
  Gnomon	
  
Fgenesh++	
  with	
  RNASeq	
  training	
  data	
  
Fgenesh++	
  without	
  RNASeq	
  training	
  data	
  
NCBI	
  RefSeq	
  Protein	
  Coding	
  Genes	
  and	
  Low	
  Quality	
  
Protein	
  Coding	
  Genes	
  
1.4	
  Ab	
  ini&o	
  protein	
  coding	
  gene	
  predicDons:	
  
Augustus	
  Set	
  12,	
  Augustus	
  Set	
  9,	
  Fgenesh,	
  GeneID,	
  
N-­‐SCAN,	
  SGP2	
  
	
  
1.5	
  Transcript	
  Sequence	
  Alignment:	
  
NCBI	
  ESTs,	
  Apis	
  cerana	
  RNA-­‐Seq,	
  Forager	
  Bee	
  Brain	
  
Illumina	
  ConBgs,	
  Nurse	
  Bee	
  Brain	
  Illumina	
  ConBgs,	
  
Forager	
  RNA-­‐Seq	
  reads,	
  Nurse	
  RNA-­‐Seq	
  reads,	
  
Abdomen	
  454	
  ConBgs,	
  Brain	
  and	
  Ovary	
  454	
  
ConBgs,	
  Embryo	
  454	
  ConBgs,	
  Larvae	
  454	
  ConBgs,	
  
Mixed	
  Antennae	
  454	
  ConBgs,	
  Ovary	
  454	
  ConBgs	
  
Testes	
  454	
  ConBgs,	
  Forager	
  RNA-­‐Seq	
  HeatMap,	
  
Forager	
  RNA-­‐Seq	
  XY	
  Plot,	
  Nurse	
  RNA-­‐Seq	
  
HeatMap,	
  Nurse	
  RNA-­‐Seq	
  XY	
  Plot	
  	
  
Becoming Acquainted with Web Apollo.
Exercises (continued)
Live	
  DemonstraBon	
  using	
  the	
  Apis	
  mellifera	
  genome.	
  
90
1.	
  Evidence	
  in	
  support	
  of	
  protein	
  coding	
  gene	
  
models	
  (ConDnued).	
  
	
  
1.6	
  Protein	
  homolog	
  alignment:	
  
Acep_OGSv1.2	
  
Aech_OGSv3.8	
  
Cflo_OGSv3.3	
  
Dmel_r5.42	
  
Hsal_OGSv3.3	
  
Lhum_OGSv1.2	
  
Nvit_OGSv1.2	
  
Nvit_OGSv2.0	
  
Pbar_OGSv1.2	
  
Sinv_OGSv2.2.3	
  
Znev_OGSv2.1	
  
Metazoa_Swissprot	
  
	
  
	
  
2.	
  Evidence	
  in	
  support	
  of	
  non	
  protein	
  coding	
  gene	
  
models	
  
	
  
2.1	
  Non-­‐protein	
  coding	
  gene	
  predicDons:	
  
NCBI	
  RefSeq	
  Noncoding	
  RNA	
  
NCBI	
  RefSeq	
  miRNA	
  
	
  
2.2	
  Pseudogene	
  predicDons:	
  
NCBI	
  RefSeq	
  Pseudogene	
  
Becoming Acquainted with Web Apollo.
Web Apollo Workshop Instances

Demo	
  1:	
  hNp://genomes.missouri.edu:8080/Amel_4.5_demo_1	
  	
  	
  
	
  
Demo	
  2:	
  hNp://genomes.missouri.edu:8080/Amel_4.5_demo_2	
  	
  
	
  
Workshop	
  DocumentaBon	
  can	
  be	
  found	
  at:	
  
Basecamp	
  
	
  
Web	
  Apollo	
  instance	
  for	
  Diaphorina	
  citri	
  	
  
hNps://apollo.nal.usda.gov/diacit/selectTrack.jsp	
  
	
  
Register	
  for	
  i5K	
  Workspace@NAL	
  at:	
  
hNps://i5k.nal.usda.gov/web-­‐apollo-­‐registraBon	
  
FUTURE PLANS

interactive analysis and curation of variants
v  InteracBve	
  exploraBon	
  of	
  VCF	
  
files	
  (e.g.	
  from	
  GATK,	
  VAAST)	
  in	
  
addiBon	
  to	
  BAM	
  and	
  GVF.	
  
	
  
MulBple	
  tracks	
  in	
  one:	
  
visualizaBon	
  of	
  geneBc	
  alteraBons	
  
and	
  populaBon	
  frequency	
  of	
  
variants.	
  
WEB APOLLO 92
1	
  
1	
  
2	
  
v  Clinical	
  applicaBons:	
  analysis	
  of	
  
Copy	
  Number	
  VariaBons	
  for	
  
regulatory	
  effects;	
  overlaying	
  
display	
  of	
  the	
  regulatory	
  domains.	
  
Philips-­‐Creminis	
  and	
  Corces.	
  2013.	
  Cell	
  50	
  (4):461-­‐474	
  
2	
  
TADs:	
  topologically	
  associaBng	
  domains	
  
FUTURE PLANS

educational tools
We	
  are	
  working	
  with	
  educators	
  to	
  make	
  Web	
  Apollo	
  part	
  of	
  their	
  curricula.	
  
WEB APOLLO 93
Lecture	
  Series.	
  
In	
  the	
  classroom.	
  
At	
  the	
  lab.	
  
Classroom	
  exercises:	
  from	
  
genome	
  sequence	
  to	
  
hypothesis.	
  
CuraBon	
  group	
  dedicated	
  
to	
  producing	
  educaBon	
  
materials	
  for	
  non-­‐model	
  
organism	
  communiBes.	
  
Our	
  team	
  provides	
  online	
  
documentaBon,	
  hands-­‐on	
  
training,	
  and	
  rapid	
  
response	
  to	
  users.	
  
JOIN US

Footer 94
http://GenomeArchitect.org/
Please bring your suggestions,
requests, and contributions to:
Nathan Dunn
Apollo Technical Lead
Eric Yao
JBrowse, UC Berkeley
Deepak Unni
Colin Diesh
Apollo Developers
Elsik Lab, University of Missouri
Suzi Lewis
Principal Investigator
Berkeley	
  BOP	
  
•  Berkeley	
  BioinformaDcs	
  Open-­‐source	
  Projects	
  (BBOP),	
  
Berkeley	
  Lab:	
  Web	
  Apollo	
  and	
  Gene	
  Ontology	
  teams.	
  
Suzanna	
  E.	
  Lewis	
  (PI).	
  
•  §	
  Chris&ne	
  G.	
  Elsik	
  (PI).	
  University	
  of	
  Missouri.	
  	
  
•  *	
  Ian	
  Holmes	
  (PI).	
  University	
  of	
  California	
  Berkeley.	
  
•  Arthropod	
  genomics	
  community:	
  i5K	
  Steering	
  
CommiNee	
  (esp.	
  Sue	
  Brown	
  (Kansas	
  State)),	
  Alexie	
  
Papanicolaou	
  (UWS),	
  BGI,	
  Oliver	
  Niehuis	
  (1KITE	
  
hNp://www.1kite.org/),	
  and	
  the	
  Honey	
  Bee	
  Genome	
  
Sequencing	
  ConsorBum.	
  
•  Apollo	
  is	
  supported	
  by	
  NIH	
  grants	
  5R01GM080203	
  from	
  
NIGMS,	
  and	
  5R01HG004483	
  from	
  NHGRI;	
  by	
  Contract	
  
No.	
  60-­‐8260-­‐4-­‐005	
  from	
  the	
  NaBonal	
  Agricultural	
  Library	
  
(NAL)	
  at	
  the	
  United	
  States	
  Department	
  of	
  Agriculture	
  
(USDA);	
  and	
  by	
  the	
  Director,	
  Office	
  of	
  Science,	
  Office	
  of	
  
Basic	
  Energy	
  Sciences,	
  of	
  the	
  U.S.	
  Department	
  of	
  Energy	
  
under	
  Contract	
  No.	
  DE-­‐AC02-­‐05CH11231.	
  
•  Insect	
  images	
  used	
  with	
  permission:	
  
hNp://AlexanderWild.com	
  
•  For	
  your	
  aQenDon,	
  thank	
  you!	
  
Thank you. 95
Web	
  Apollo	
  
Nathan	
  Dunn	
  
Colin	
  Diesh	
  §	
  
Deepak	
  Unni	
  §	
  	
  
	
  
Gene	
  Ontology	
  
Chris	
  Mungall	
  
Seth	
  Carbon	
  
Heiko	
  Dietze	
  
	
  
BBOP	
  
Web	
  Apollo:	
  hNp://GenomeArchitect.org	
  	
  
i5K:	
  hNp://arthropodgenomes.org/wiki/i5K	
  
GO:	
  hNp://GeneOntology.org	
  
Thanks!	
  
NAL	
  at	
  USDA	
  
Monica	
  Poelchau	
  
Christopher	
  Childers	
  
Gary	
  Moore	
  
HGSC	
  at	
  BCM	
  
fringy	
  Richards	
  
Dan	
  Hughes	
  
Kim	
  Worley	
  
	
  
JBrowse	
   	
   	
   	
  	
  Eric	
  Yao	
  *	
  

More Related Content

What's hot

Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMonica Munoz-Torres
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl databaseAshfaq Ahmad
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Projectvaibhavdeoda
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...RussellHanson
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterMonica Munoz-Torres
 
Light Intro to the Gene Ontology
Light Intro to the Gene OntologyLight Intro to the Gene Ontology
Light Intro to the Gene Ontologynniiicc
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 

What's hot (20)

Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Genome scale-data as networks
Genome scale-data as networksGenome scale-data as networks
Genome scale-data as networks
 
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Project
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
 
Light Intro to the Gene Ontology
Light Intro to the Gene OntologyLight Intro to the Gene Ontology
Light Intro to the Gene Ontology
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
Ensembl genome
Ensembl genomeEnsembl genome
Ensembl genome
 
Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 
Biological networks
Biological networksBiological networks
Biological networks
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 

Viewers also liked

JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRMonica Munoz-Torres
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Monica Munoz-Torres
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Monica Munoz-Torres
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Monica Munoz-Torres
 
Investigations on Synthesis, Purification and Characterization of Indium Anti...
Investigations on Synthesis, Purification and Characterization of Indium Anti...Investigations on Synthesis, Purification and Characterization of Indium Anti...
Investigations on Synthesis, Purification and Characterization of Indium Anti...AM Publications
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Monica Munoz-Torres
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalMonica Munoz-Torres
 
Diapositivas tuberculosis y salud publica
Diapositivas tuberculosis y salud publica Diapositivas tuberculosis y salud publica
Diapositivas tuberculosis y salud publica CesarArgus96
 
Gds quick reference guide
Gds quick reference guideGds quick reference guide
Gds quick reference guidemaman_supir
 
Expert System to Determine the Priority Scale of Case in Laboratory of Forens...
Expert System to Determine the Priority Scale of Case in Laboratory of Forens...Expert System to Determine the Priority Scale of Case in Laboratory of Forens...
Expert System to Determine the Priority Scale of Case in Laboratory of Forens...AM Publications
 
Doppler medios diagnostico y cuidados de enfermeria
Doppler  medios diagnostico y cuidados de enfermeria Doppler  medios diagnostico y cuidados de enfermeria
Doppler medios diagnostico y cuidados de enfermeria CesarArgus96
 

Viewers also liked (20)

JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGR
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Investigations on Synthesis, Purification and Characterization of Indium Anti...
Investigations on Synthesis, Purification and Characterization of Indium Anti...Investigations on Synthesis, Purification and Characterization of Indium Anti...
Investigations on Synthesis, Purification and Characterization of Indium Anti...
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
 
Diapositivas tuberculosis y salud publica
Diapositivas tuberculosis y salud publica Diapositivas tuberculosis y salud publica
Diapositivas tuberculosis y salud publica
 
Gds quick reference guide
Gds quick reference guideGds quick reference guide
Gds quick reference guide
 
Expert System to Determine the Priority Scale of Case in Laboratory of Forens...
Expert System to Determine the Priority Scale of Case in Laboratory of Forens...Expert System to Determine the Priority Scale of Case in Laboratory of Forens...
Expert System to Determine the Priority Scale of Case in Laboratory of Forens...
 
Raviz Hotels and Resorts
Raviz Hotels and ResortsRaviz Hotels and Resorts
Raviz Hotels and Resorts
 
Windowns
WindownsWindowns
Windowns
 
Flags
FlagsFlags
Flags
 
Doppler medios diagnostico y cuidados de enfermeria
Doppler  medios diagnostico y cuidados de enfermeria Doppler  medios diagnostico y cuidados de enfermeria
Doppler medios diagnostico y cuidados de enfermeria
 

Similar to Web Apollo Manual Annotation Workshop Introduction

Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Monica Munoz-Torres
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Monica Munoz-Torres
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKMonica Munoz-Torres
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisMonica Munoz-Torres
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityMonica Munoz-Torres
 
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraAn introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraMonica Munoz-Torres
 
Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.Monica Munoz-Torres
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityMonica Munoz-Torres
 
Three's a crowd-source: Observations on Collaborative Genome Annotation
Three's a crowd-source: Observations on Collaborative Genome AnnotationThree's a crowd-source: Observations on Collaborative Genome Annotation
Three's a crowd-source: Observations on Collaborative Genome AnnotationMonica Munoz-Torres
 
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...Nathan Dunn
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian Aurisano
 

Similar to Web Apollo Manual Annotation Workshop Introduction (20)

Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - HemipteraAn introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
 
Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research Community
 
2014 naples
2014 naples2014 naples
2014 naples
 
Three's a crowd-source: Observations on Collaborative Genome Annotation
Three's a crowd-source: Observations on Collaborative Genome AnnotationThree's a crowd-source: Observations on Collaborative Genome Annotation
Three's a crowd-source: Observations on Collaborative Genome Annotation
 
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
Microbiology Assignment Help
Microbiology Assignment HelpMicrobiology Assignment Help
Microbiology Assignment Help
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 

More from Monica Munoz-Torres

Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Monica Munoz-Torres
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityMonica Munoz-Torres
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsMonica Munoz-Torres
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Monica Munoz-Torres
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Monica Munoz-Torres
 

More from Monica Munoz-Torres (6)

Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunity
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation Tools
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05
 

Recently uploaded

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 

Recently uploaded (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

Web Apollo Manual Annotation Workshop Introduction

  • 1. An Introduction to Web Apollo
 Manual Annotation Workshop at Kansas State University Monica Munoz-Torres, PhD | @monimunozto
 Berkeley Bioinformatics Open-Source Projects (BBOP)
 Genomics Division, Lawrence Berkeley National Laboratory
 IX Arthropod Genomics Symposium. Manhattan, KS. 17 June, 2015
  • 2. 2COURSE MATERIAL Recommended  Browsers:  Google  Chrome,  Firefox.     Exercises  file  available  at  Basecamp     Workshop  slides  and  answers  to  exercises  will  be   available  on  Basecamp  next  week.   TODAY
  • 3. OUTLINE
 Web  Apollo  CollaboraBve  CuraBon  and     InteracBve  Analysis  of  Genomes   3OUTLINE •  GENOME  CURATION   steps  involved   •  COMMUNITY  BASED  CURATION   our  experience     •  APOLLO   empowering  collaboraBve  curaBon     •  APOLLO  on  THE  WEB   becoming  acquainted   •  PRACTICE   demonstraBon  and  exercises  
  • 4. 4 DURING THIS WORKSHOP
 you will
 v Understand  the  process  of  genome  curaBon  in  the  context  of   annotaBon:     assembled  genome  à  automated  annotaBon  à  manual  annotaBon   v Become  familiar  with  the  environment  and  funcBonality  of  the  Web   Apollo  genome  annotaBon  ediBng  tool.   v Learn  to  idenBfy  homologs  of  known  genes  of  interest  in  a  newly   sequenced  genome  of  interest.   v Learn  how  to  corroborate  and  modify  automaBcally  generated  gene   models  using  available  biological  evidence  (in  Apollo).   Introduction
  • 5. 5 I INVITE YOU TO: v  Observe  details  in  figures   v  Listen  to  explanaBons   v  Ask  quesBons  at  any  Bme   v  Use  TwiNer  &  share  your  thoughts:  I  am  @monimunozto     A  few  tags  &  users:  #WebApollo  #annotaBon  #biocuraBon  #GMOD   #genome  @JBrowseGossip   v  Take  brakes:     LBL’s  ergo  team  suggests  I  should  not  work  at  the  computer  for  >45   minutes  without  a  break;  neither  should  you!  We  will  be  here  for  ~2.5   hours:  please  get  up  and  stretch  your  neck,  arms,  and  legs  as  o^en  as   you  need.   Introduction
  • 6. I kindly ask that you refrain from: v  Reading  all  the  text  I  wrote.     Think  of  the  text  on  these  slides  as  your  “class  notes”.  You   will  use  them  during  exercises.   v  Checking  email.   You  have  my  undivided  aNenBon,  I’d  like  to  receive  yours  in   exchange.     Warning:  If  you  get  *caught*,  you  will  read  it  out  loudly  for   everyone  to  hear,  we  may  contribute  to  the  response.   Introduction
  • 7. Let Us Get Started
  • 8. REMEMBER, REMEMBER… 
 from intro webinar last week Web  Apollo  IntroducDon   Biological  concepts  to  beNer   understand  manual  annotaBon   8OUTLINE •  CENTRAL  DOGMA   in  molecular  biology     •  WHAT  IS  A  GENE?   let’s  think  computaBonally   •  TRANSCRIPTION   mRNA  in  detail     •  TRANSLATION   and  many  definiBons   •  GENOME  CURATION   steps  involved   •  WHAT  TO  LOOK  FOR   training  the  annotators  
  • 9. CURATING GENOMES
 steps involved 1  GeneraDon  of  Gene  Models   calling  ORFs,  one  or  more   rounds  of  gene  predicBon,  etc.     2  AnnotaDon  of  gene  models   Describing  funcBon,   expression  paNerns,  metabolic   network  memberships.   3     Manual  annotaDon   CURATING GENOMES 9
  • 10. 10Manual Curation GENE PREDICTION v  The  idenBficaBon  of  structural  features  of  the  genome.   •  Primarily  protein-­‐coding  genes.     •  Also  transfer  RNAs  (tRNA),  ribosomal  RNAs  (rRNA),   regulatory  moBfs,  long  and  small  non-­‐coding  RNAs  (ncRNA),   repeBBve  elements  (masked),  etc.  
  • 11. 11Manual Curation GENE PREDICTION v  Methods  for  discovery:     1)  Ab  ini&o:  based  on  DNA   composiBon,  deals  strictly  with   genomic  sequences  and  makes   use  of  staBsBcal  approaches  to   search  for  coding  regions  and   typical  gene  signals.       •  E.g.  Augustus,  GENSCAN,     geneid,  fgenesh,  etc.  
  • 12. 12 Nucleic Acids 2003 vol. 31 no. 13 3738-3741 Manual Curation GENE PREDICTION v  Methods  for  discovery:   2)  Homology-­‐based:  evidence-­‐based;  finds  genes  using  either  similarity   searches  in  the  main  databases  or  experimental  data  including  RNAseq,   expressed  sequence  tags  (ESTs),  full-­‐length  complementary  DNAs  (cDNAs),   etc.     •  E.g:  SGP2,  fgenesh++  
  • 13. 13 In  some  cases  algorithms  and  metrics  used  to  generate   consensus  sets  may  actually  reduce  the  accuracy  of  the  gene’s   representaBon;  in  such  cases  it  is  usually  beNer  to  use  an     ab  ini&o  model  to  create  a  new  annotaBon.   GENE ANNOTATION IntegraBon  of  data  from  predicBon  tools  to  generate  a  reliable  set  of   structural  annotaDons:  involves  ab  ini&o  predicBons,  assessment  of   biological  evidence  to  drive  the  gene  predicBon  process,  and  the   synthesis  of  these  results  to  produce  a  set  of  consensus  gene  models.     v  Models  may  be  organized  using:   v  automaBc  integraBon  of  predicted  sets;  e.g:  GLEAN   v  packaged  tools  from  pipeline;  e.g:  MAKER   Manual Curation
  • 14. NOT PERFECT 
 automated annotation remains an imperfect art Unlike  the  more  highly  polished  genomes  of  earlier  projects,  today’s  genomes   have:   1.  lower  coverage.   2.  more  frequent  assembly  errors  and  annotaBon  of  genes  across   mulBple  scaffolds.   CURATING GENOMES 14 Image: www.BroadInstitute.org
  • 15. MANUAL ANNOTATION
 working concept   v  Precise  elucidaBon  of  biological  features  encoded   in  the  genome  requires  careful  examinaBon  and   review.     Schiex  et  al.  Nucleic  Acids  2003  (31)  13:  3738-­‐3741   Automated Predictions Experimental Evidence Manual Curation 15 cDNAs,  HMM  domain  searches,  RNAseq,   genes  from  other  species.  
  • 16. MANUAL ANNOTATION
 is necessary v  Evaluate  all  available  evidence  and  corroborate   or  modify  genome  element  predicBons.     v  Determine  funcBonal  roles  through  comparaBve   analysis  using  literature,  databases,  and   experimental  data.   v  Resolve  discrepancies  and  validate  automated   gene  model  hypotheses.   v  Desktop  version  of  Apollo            was  designed  to  fit   the  manual  annotaBon  needs  of  genome  projects   such  as  fruit  fly,  mouse,  zebrafish,  human,  etc.   Manual Curation 16 Automated Predictions Curated Gene Models Official Gene Set “Incorrect  and  incomplete  genome  annota&ons   will  poison  every  experiment  that  uses  them”.   -­‐  M.  Yandell  
  • 17. BUT, MANUAL CURATION
 did not always scale well A  small  group  of  highly   trained  experts;  e.g.  GO   1   Museum  Model   A  few  very  good  biologists   and  a  few  very  good   bioinformaBcians  camp   together,  during  intense  but   short  periods  of  Bme.   Old-­‐Dme  Jamborees  2   Researchers  work  by   themselves,  then  may  or   may  not  publicize  results;  …   may  be  a  dead-­‐end  with   very  few  people  ever  aware   of  these  results.   CoQage  Model  3   Elsik  et  al.  2006.  Genome  Res.  16(11):1329-­‐33.   Manual Curation 17 Too  many  sequences  and  not  enough  hands  to  approach  curaBon.  
  • 18. POWER TO THE CURATORS
 augment existing tools Fill   in   the   gap   for   all   the   things   that   won’t   be   easy   to   cover   with   these   approaches;  this  will  allow  researchers   to  beNer  contribute  their  efforts.   Give  more  people  the  power  to  curate!   Big  data  are  not  a  subsBtute  for,  but   a   supplement   to   tradiBonal   data   collecBon  and  analysis.   The  Parable  of  Google  Flu.  Lazer  et  al.  2014.  Science  343  (6176):  1203-­‐1205.   v Enable  more  curators  to  work   v Enable  beNer  scienBfic  publishing   v Credit  curators  for  their  work     Manual Curation 18
  • 19. IMPROVING TOOLS FOR MANUAL ANNOTATION
 our plan “More  and  more  sequences”:  more  genomes,  within  populaBons  and   across  species,  are  now  being  sequenced.       This  begs  the  need  for  a  universally  accessible  genome  curaBon  tool:   Manual Curation 19 To  produce  accurate  sets  of   genomic  features.   To  address  the  need  to  correct  for   more  frequent  assembly  and   automated  predicBon  errors  due   to  new  sequencing  technologies.  
  • 20. GENOME ANNOTATION
 an inherently collaborative task Researchers  o^en  turn  to  colleagues  for  second  opinions  and  insight  from  those   with  experBse  in  parBcular  areas  (e.g.,  domains,  families).  To  facilitate  and   encourage  this,  we  conBnue  to  improve  Apollo.   APOLLO 20 Apollo  is  a  web-­‐based,  collaboraBve  genomic  annotaBon   ediBng  plavorm.   We  need  annota&on  edi&ng  tools  to  modify  and  refine  the   precise  loca&on  and  structure  of  the  genome  elements  that   predic&ve  algorithms  cannot  yet  resolve  automa&cally.   hNp://GenomeArchitect.org    
  • 21. APOLLO
 genome annotation editing tool 21 v  Web  based,  integrated  with  JBrowse.   v  Supports  real  Bme  collaboraBon!   v  AutomaBc  generaBon  of  ready-­‐made  computable  data.     v  Supports  annotaBon  of  genes,    pseudogenes,  tRNAs,  snRNAs,   snoRNAs,  ncRNAs,  miRNAs,  TEs,  and  repeats.   v  IntuiBve  annotaBon,  gestures,  and  pull-­‐down  menus  to  create  and   edit  transcripts  and  exons  structures,  insert  comments  (CV,  freeform   text),  GO  terms,  etc.   APOLLO
  • 22. NEW APOLLO ARCHITECTURE
 simpler, more flexible APOLLO 22 Web-­‐based  client  +  annotaBon-­‐ediBng  engine  +  server-­‐side  data  service   REST / JSON Websockets Annotation Engine (Server) Shiro LDAP OAuth JBrowse Data Organism 2 Annotations Security Preferences Organisms Tracks BAM BED VCF GFF3 BigWig Annotators Google Web Toolkit (GWT) / Bootstrap JBrowse DOJO / jQuery JBrowse Data Organism 1 Load genomic evidence for selected organism Single Data Store PostgreSQL, MySQL, MongoDB, ElasticSearch Apollo v2.0
  • 23. We  conBnuously  train  and  support  hundreds  of  geographically  dispersed   scienBsts   from   many   research   communiBes   to   conduct   manual   annotaBons,  recovering  coding  sequences  in  agreement  with  all  available   biological  evidence  using  Web  Apollo.       v  Gate  keeping  and  monitoring.   v  Tutorials,  training  workshops,  and  “geneborees”.   v  Personalized  user  support.   23 DISPERSED COMMUNITIES collaborative manual annotation efforts APOLLO
  • 24. 24 CURATION
 how it works IdenBfies  elements  that  best   represent  the  underlying  biology   (including  missing  genes)  and   eliminates  elements  that  reflect   systemic  errors  of  automated   analyses.   Assigns  funcBon  through   comparaBve  analysis  of  similar   genome  elements  from  closely   related  species  using  literature,   databases,  and  researchers’  lab   data.   1   2   Examples   Comparing  7  ant  genomes  contributed  to   beNer  understanding  evoluBon  and   organizaBon  of  insect  socieBes  at  the   molecular  level;  e.g.  division  of  labor,   mutualism,  chemical  communicaBon,  etc.   Libbrecht  et  al.  2012.  Genome  Biology  2013,  14:212   Queen  Bee   Worker  Bee   Castes   Larva   Dnmt   RNAi  Royal  jelly   Kucharski  et  al.  2008.  Science  (319)  5871:  1827-­‐1830       Insect   Methylome   Anchoring  molecular   markers  to  reference   genome  pointed  to   chromosomal   rearrangements  &  detecBng   signals  of  adapBve  radiaBon   in  Heliconius  buNerflies.     Joron  et  al.  2011.  Nature,  477:203-­‐206  APOLLO
  • 25. CURRENT COLLABORATIONS
 training and contributions Partnerships   WEB APOLLO 25 UNIVERSITY of MISSOURI National Agricultural Library Nature  Reviews  Gene&cs  2009  (10),  346-­‐347   Norwegian  Spruce  hNp://congenie.org/   Phlebotomus  papatasi   Tallapoosa  darter  hNp://darter2.westga.edu/   Wasmania  auropunctata   Homo  sapiens  hg19   Pinus  taeda   hIp://dendrome.ucdavis.edu/treegenes/browsers/  
  • 26. LESSONS LEARNED
 What  we  have  learned:     •  CollaboraBve  work  disBlls  invaluable  knowledge   •  We  must  enforce  strict  rules  and  formats   •  We  must  evolve  with  the  data   •  A  liNle  training  goes  a  long  way   •  NGS  poses  addiBonal  challenges   PREVIOUSLY WE LEARNED 26
  • 27. THE COLLABORATIVE CURATION PROCESS AT I5K 1)  In  some  cases  a  computaBonally  predicted  consensus  gene  set   is  generated  using  mulBple  lines  of  evidence.  In  other  cases,   more  than  one  gene  set  are  made  available  for  analysis:  e.g.   Primary  Gene  Sets:  HAZT_v0.5.3-­‐Models,  Augustus  gene  set.   2)  i5K   Projects   will   integrate   consensus   computaBonal   predicBons   with   manual   annotaBons   to   produce   an   updated   Official  Gene  Set  (OGS):   »  If  it’s  not  on  either  track,  it  won’t  make  the  OGS!   »  If  it’s  there  and  it  shouldn’t,  it  will  sBll  make  the  OGS!   27Collaborative Curation at i5K
  • 28. CONSENSUS SET: REFERENCE AND START POINT •  Isoforms:  drag  original  and  alternaBvely  spliced  form  to  ‘User-­‐ created  Annota&ons’  area.   •  If  an  annotaBon  needs  to  be  removed  from  the  consensus  set,   drag  it  to  the  ‘User-­‐created  Annota&ons’  area  and  label  as   ‘Delete’  on  InformaBon  Editor.   •  Overlapping  interests?  Collaborate  to  reach  agreement.   •  Follow  guidelines  for  i5K  Pilot  Species  Projects  as  shown  at   hNp://goo.gl/LRu1VY     28Collaborative Curation at i5K
  • 30. Sort 30Becoming Acquainted with Web Apollo. 30 WEB APOLLO
 the sequence selection window

  • 31. NavigaBon  tools:   pan  and  zoom   Search  box:  go  to   a  scaffold  or  a   gene  model.     Grey  bar  of  coordinates   indicates  locaBon.  You  can  also   select  here  in  order  to  zoom  to   a  sub-­‐region.   ‘View’:  change   color  by  CDS,   toggle  strands,   set  highlight.   ‘File’:   Upload  your  own   evidence:  GFF3,  BAM,   BigWig,  VCF*.  Add   combinaBon  and   sequence  search   tracks.   ‘Tools’:     Use  BLAT  to  query  the   genome  with  a  protein   or  DNA  sequence.   Available Tracks Evidence  Tracks  Area   ‘User-­‐created  AnnotaBons’  Track   Login 31 WEB APOLLO
 graphical user interface (GUI) for editing annotations Becoming Acquainted with Web Apollo.
  • 32. In  addiBon  to  protein-­‐coding  gene  annotaBon  that  you  know  and  love.   •  Non-­‐coding  genes:  ncRNAs,  miRNAs,  repeat  regions,  and  TEs   •  Sequence  alteraBons  (less  coverage  =  more  fragmentaBon)   •  VisualizaBon  of  stage  and  cell-­‐type  specific  transcripBon  data  as   coverage  plots,  heat  maps,  and  alignments   32 32 WEB APOLLO
 additional functionality Becoming Acquainted with Web Apollo.
  • 33. 1.  Select  a  chromosomal  region  of  interest,  e.g.  scaffold.   2.  Select  appropriate  evidence  tracks.   3.  Determine  whether  a  feature  in  an  exisBng  evidence  track  will  provide  a   reasonable  gene  model  to  start  working.   -­‐  If  yes:  select  and  drag  the  feature  to  the  ‘User-­‐created  AnnotaBons’   area,  creaDng  an  iniDal  gene  model.  If  necessary  use  ediBng  funcBons   to  adjust  the  gene  model.   -­‐  If  not:  let’s  talk.   4.  Check  your  edited  gene  model  for  integrity  and  accuracy  by  comparing  it   with  available  homologs.   Becoming Acquainted with Web Apollo 33 | Always  remember:  when  annotaBng  gene  models  using  Web  Apollo,   you  are  looking  at  a  ‘frozen’  version  of  the  genome  assembly  and  you   will  not  be  able  to  modify  the  assembly  itself.   33 GENERAL PROCESS OF CURATION
 steps to remember
  • 34. Choose  (click  or  drag)  appropriate  evidence  tracks  from  the  list  on  the  le^.     Click  on  an  exon  to  select  it.  Double  click  on  an  exon  or  single  click  on  an  intron  to  select  the  enBre   gene.   Select  &  drag  any  elements  from  an  evidence  track  into  the  curaBon  area:  these  are  editable  and   considered  the  curated  version  of  the  gene.  Other  opBons  for  elements  in  evidence  tracks   available  from  right-­‐click  menu.   If  you  select  an  exon  or  a  gene,  then  every  track  is  automaBcally  searched  for  exons  with  exactly  the   same  co-­‐ordinates  as  what  you  selected.  Matching  edges  are  highlighted  red.   Hovering  over  an  annotaBon  in  progress  brings  up  an  informaBon  pop-­‐up.   34 | 34 Becoming Acquainted with Web Apollo. USER NAVIGATION
  • 35. Right-­‐click  menu:   •  With  the  excepBon  of  deleBng  a  model,   all  edits  can  be  reversed  with  ‘Undo’   opBon.  ‘Redo’  also  available.  All  changes   are  immediately  saved  and  available  to  all   users  in  real  Bme.   •  ‘Get  sequence’  retrieves  pepBde,  cDNA,   CDS,  and  genomic  sequences.   •  You  can  select  an  exon  and  select   ‘Delete’.  You  can  create  an  intron,  flip  the   direcBon,  change  the  start  or  split  the   gene.     35 | 35 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 36. Right-­‐click  menu:   •  If  you  select  two  gene  models,  you  can   join  them  using  ‘Merge’,  and  you  may  also   ‘Split’  a  model.   •  You  can  select  ‘Duplicate’,  for  example  to   annotate  isoforms.   •  Set  translaBon  start,  annotate   selenocysteine-­‐containing  proteins,   match  edges  of  annotaBon  to  those  of   evidence  tracks.   36 | 36 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 37. 37 AnnotaBons,  annotaBon  edits,  and  History:  stored  in  a  centralized  database.   37 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 38. 38 The  AnnotaBon  InformaBon  Editor   DBXRefs  are  database  crossed  references:  if   you  have  reason  to  believe  that  this  gene  is   linked  to  a  gene  in  a  public  database  (including   your  own),  then  add  it  here.   38 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 39. 39 The  AnnotaBon  InformaBon  Editor   •  Add  PubMed  IDs   •  Include  GO  terms  as  appropriate   from  any  of  the  three  ontologies   •  Write  comments  staBng  how  you   have  validated  each  model.   39 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 40. 40 | •  ‘Zoom  to  base  level’  opBon  reveals  the   DNA  Track.   •  Change  color  of  exons  by  CDS  from   the  ‘View’  menu.   •  The  reference  DNA  sequence  is  visible   in  both  direcBons  as  are  the  protein   translaBons  in  all  six  frames.  You  can   toggle  either  direcBon  to  display  only   3  frames.     Zoom  in/out  with  keyboard:   shi^  +  arrow  keys  up/down   40 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 41. Web Apollo User Guide (Fragment) http://genomearchitect.org/web_apollo_user_guide
  • 42. In  a  “simple  case”  the  predicted  gene  model  is  correct  or  nearly   correct,  and  this  model  is  supported  by  evidence  that   completely  or  mostly  agrees  with  the  predicBon.     Evidence  that  extends  beyond  the  predicted  model  is  assumed  to   be  non-­‐coding  sequence.       The  following  secBons  describe  simple  modificaBons.       42 | 42 ANNOTATING SIMPLE CASES Becoming Acquainted with Web Apollo.
  • 43. Select  and  drag  the  putaBve  new  exon  from  a  track,  and  add  it  directly  to  an   annotated  transcript  in  the  ‘User-­‐created  AnnotaBons’  area.     •  Click  the  exon,  hold  your  finger  on  the  mouse  buNon,  and  drag  the  cursor   unBl  it  touches  the  receiving  transcript.  A  dark  green  highlight  indicates  it  is   okay  to  release  the  mouse  buNon.     •  When   released,   the   addiBonal   exon   becomes   aNached   to   the   receiving   transcript.   43 | •  A   confirmaBon   box   will   warn   you   if   the   receiving   transcript   is   not   on   the   same   strand   as   the   feature   where   the   new   exon  originated.   43 ADDING EXONS Becoming Acquainted with Web Apollo.
  • 44. Each  Bme  you  add  an  exon  region,  whether  by  extension  or  adding   an  exon,  Web  Apollo  recalculates  the  longest  ORF,  idenBfying   ‘Start’  and  ‘Stop’  signals  and  allowing  you  to  determine   whether  a  ‘Stop’  codon  has  been  incorporated  a^er  each   ediBng  step.   44 | Web  Apollo  demands  that  an  exon  already  exists  as  an  evidence  in  one  of  the  tracks.   You  could  provide  a  text  file  in  GFF  format  and  select  File  à  Open.  GFF  is  a  simple  text   file  delimited  by  TABs,  one  line  for  each  genomic  ‘feature’:  column  1  is  the  name  of   the  scaffold;  then  some  text  (irrelevant),  then  ‘exon’,  then  start,  stop,  strand  as  +  or  -­‐,   a  dot,  another  dot,  and  Name=some  name   Example:   scaffold_88  Qratore  exon  21  2111  +  .  .  Name=bob   scaffold_88  Qratore  exon  2201  5111  +  .  .  Name=rad   44 ADDING EXONS Becoming Acquainted with Web Apollo.
  • 45. Gene  predicBons  may  or  may  not  include  UTRs.  If   transcript  alignment  data  are  available  and   extend  beyond  your  original  annotaBon,  you   may  extend  or  add  UTRs.     1.  PosiBon  the  cursor  at  the  beginning  of  the   exon  that  needs  to  be  extended  and  ‘Zoom   to  base  level’.     2.  Place  the  cursor  over  the  edge  of  the  exon   unBl  it  becomes  a  black  arrow  then  click  and   drag  the  edge  of  the  exon  to  the  new   coordinate  posiBon  that  includes  the  UTR.     45 | View  zoomed  to  base  level.  The  DNA  track   and  annotaBon  track  are  visible.  The  DNA   track  includes  the  sense  strand  (top)  and   anB-­‐sense   strand   (boNom).   The   six   reading  frames  flank  the  DNA  track,  with   the   three   forward   frames   above   and   the   three   reverse   frames   below.   The   User-­‐ created   AnnotaBon   track   shows   the   terminal  end  of  an  annotaBon.  The  green   rectangle   highlights   the   locaBon   of   the   nucleoBde  residues  in  the  ‘Stop’  signal.   To  add  a  new  spliced  UTR  to  an  exisBng   annotaBon  follow  the  procedure  for  adding   an  exon.   45 ADDING UTRs Becoming Acquainted with Web Apollo.
  • 46. 1.  Zoom  in  sufficiently  to  clearly  resolve  each  exon  as  a  disBnct   rectangle.     2.  Two  exons  from  different  tracks  sharing  the  same  start  and/or  end   coordinates  will  display  a  red  bar  to  indicate  the  matching  edges.   3.  SelecBng  the  whole  annotaBon  or  one  exon  at  a  Bme,  use  this  ‘edge-­‐ matching’  funcBon  and  scroll  along  the  length  of  the  annotaBon,   verifying  exon  boundaries  against  available  data.  Use  square  [  ]   brackets  to  scroll  from  exon  to  exon.   4.  Note  if  there  are  cDNA  /  RNAseq  reads  that  lack  one  or  more  of  the   annotated  exons  or  include  addiBonal  exons.       46 | 46 EXON STRUCTURE INTEGRITY Becoming Acquainted with Web Apollo.
  • 47. To  modify  an  exon  boundary  and  match   data   in   the   evidence   tracks:   select   both   the   offending   exon   and   the   feature  with  the  expected  boundary,   then  right  click  on  the  annotaBon  to   select  ‘Set  3’  end’  or  ‘Set  5’  end’  as   appropriate.     47 | In  some  cases  all  the  data  may  disagree  with  the  annotaBon,  in  other  cases   some  data  support  the  annotaBon  and  some  of  the  data  support  one  or   more  alternaBve  transcripts.  Try  to  annotate  as  many  alternaBve  transcripts   as  are  well  supported  by  the  data.   47 EXON STRUCTURE INTEGRITY Becoming Acquainted with Web Apollo.
  • 48. Flags  non-­‐canonical   splice  sites.   SelecBon  of  features  and  sub-­‐ features   Edge-­‐matching   Evidence  Tracks  Area   ‘User-­‐created  AnnotaBons’  Track   The  ediBng  logic  in  the  server:     §  selects  longest  ORF  as  CDS   §  flags  non-­‐canonical  splice  sites   48 EDITING LOGIC Becoming Acquainted with Web Apollo.
  • 49. Zoom  to  base  level  to  review  non-­‐ canonical  splice  site  warnings.   These  do  not  necessarily  need  to   be  corrected,  but  should  be   flagged  with  the  appropriate   comment.       49 | Exon/intron  juncBon  possible  error   Original  model   Curated  model   Non-­‐canonical   splices   are   indicated   by   an   orange   circle   with   a   white   exclamaBon   point   inside,   placed   over   the   edge   of   the   offending   exon.     Most   insects,   have   a   valid   non-­‐canonical   site   GC-­‐AG.   Other   non-­‐canonical   splice   sites   are   unverified.  Web  Apollo  flags  GC  splice  donors   as  non-­‐canonical.   Canonical  splice  sites:   3’-­‐…exon]GA  /  TG[exon…-­‐5’   5’-­‐…exon]GT  /  AG[exon…-­‐3’   reverse  strand,  not  reverse-­‐complemented:   forward  strand   49 SPLICE SITES Becoming Acquainted with Web Apollo.
  • 50. Some   gene   predicBon   algorithms   do   not   recognize   GC   splice   sites,   thus   the   intron/exon   juncBon   may   be   incorrect.   For   example,   one   such   gene   predicBon  algorithm  may  ignore  a  true  GC  donor   and  select  another  non-­‐canonical  splice  site  that   is  less  frequently  observed  in  nature.     Therefore,   if   upon   inspecBon   you   find   a   non-­‐ canonical   splice   site   that   is   rarely   observed   in   nature,  you  may  wish  to  search  the  region  for  a   more   frequent   in-­‐frame   non-­‐canonical   splice   site,  such  as  a  GC  donor.  If  there  is  an  in-­‐frame   site   close   that   is   more   likely   to   be   the   correct   splice   donor,   you   may   make   this   adjustment   while  zoomed  at  base  level.       50 | Exon/intron junction possible error Original model Curated model Use  RNA-­‐Seq  data  to   make  a  decision.   Canonical  splice  sites:   3’-­‐…exon]GA  /  TG[exon…-­‐5’   5’-­‐…exon]GT  /  AG[exon…-­‐3’   reverse  strand,  not  reverse-­‐complemented:   forward  strand   50 SPLICE SITES keep this in mind Becoming Acquainted with Web Apollo.
  • 51. Web  Apollo  calculates  the  longest  possible  open   reading  frame  (ORF)  that  includes  canonical   ‘Start’  and  ‘Stop’  signals  within  the  predicted   exons.     If  it  appears  to  have  calculated  an  incorrect  ‘Start’   signal,  you  may  modify  it  selecBng  an  in-­‐frame   ‘Start’  codon  further  up  or  downstream,   depending  on  evidence  (protein  database,   addiBonal  evidence  tracks).  An  upstream  ‘Start’   codon  may  be  present  outside  the  predicted   gene  model,  within  a  region  supported  by   another  evidence  track.     51 | 51 ‘START’ AND ‘STOP’ SITES Becoming Acquainted with Web Apollo.
  • 52. Note  that  the  ‘Start’  codon  may  also  be  located  in  a  non-­‐predicted  exon  further   upstream.  If  you  cannot  idenBfy  that  exon,  add  the  appropriate  note  in  the   transcript’s  ‘Comments’  secBon.   In  very  rare  cases,  the  actual  ‘Start’  codon  may  be  non-­‐canonical  (non-­‐ATG).     In  some  cases,  a  ‘Stop’  codon  may  not  be  automaBcally  idenBfied.  Check  to  see   if  there  are  data  supporBng  a  3’  extension  of  the  terminal  exon  or  addiBonal   3’  exons  with  valid  splice  sites.     52 | 52 ‘START’ AND ‘STOP’ SITES keep this in mind Becoming Acquainted with Web Apollo.
  • 53.
  • 54. Evidence  may  support  joining  two  or  more  different  gene  models.  Warning:  protein   alignments  may  have  incorrect  splice  sites  and  lack  non-­‐conserved  regions!   1.  Drag  and  drop  each  gene  model  to  ‘User-­‐created  AnnotaBons’  area.  Shi^  click  to   select  an  intron  from  each  gene  model  and  right  click  to  select  the  ‘Merge’  opBon   from  the  menu.     2.  Drag  supporBng  evidence  tracks  over  the  candidate  models  to  corroborate  overlap,  or   review  edge  matching  and  coverage  across  models.   3.  Check  the  resulBng  translaBon  by  querying  a  protein  database  e.g.  UniProt.  Record   the  IDs  of  both  starBng  gene  models  in  ‘DBXref’  and  add  comments  to  record  that  this   annotaBon  is  the  result  of  a  merge.   54 | 54 Red  lines  around  exons:   ‘edge-­‐matching’  allows  annotators  to  confirm  whether  the   evidence  is  in  agreement  without  examining  each  exon  at  the   base  level.   COMPLEX CASES merge two gene predictions on the same scaffold Becoming Acquainted with Web Apollo.
  • 55. One  or  more  splits  may  be  recommended  when  different   segments  of  the  predicted  protein  align  to  two  or  more   different  families  of  protein  homologs,  and  the  predicted   protein  does  not  align  to  any  known  protein  over  its  enBre   length.  Transcript  data  may  support  a  split  (if  so,  verify  that  it  is   not  a  case  of  alternaBve  transcripts).     55 | 55 COMPLEX CASES split a gene prediction Becoming Acquainted with Web Apollo.
  • 56. DNA  Track   ‘User-­‐created  AnnotaDons’  Track   56 COMPLEX CASES frameshifts, single-base errors, and selenocysteines Becoming Acquainted with Web Apollo.
  • 57. 1.  Web  Apollo  allows  annotators  to  make  single  base  modificaBons  or  frameshi^s  that  are   reflected  in  the  sequence  and  structure  of  any  transcripts  overlapping  the  modificaBon.  Note   that  these  manipulaBons  do  NOT  change  the  underlying  genomic  sequence.     2.  If  you  determine  that  you  need  to  make  one  of  these  changes,  zoom  in  to  the  nucleoBde  level   and  right  click  over  a  single  nucleoBde  on  the  genomic  sequence  to  access  a  menu  that   provides  opBons  for  creaBng  inserBons,  deleBons  or  subsBtuBons.     3.  The  ‘Create  Genomic  InserBon’  feature  will  require  you  to  enter  the  necessary  string  of   nucleoBde  residues  that  will  be  inserted  to  the  right  of  the  cursor’s  current  locaBon.  The   ‘Create  Genomic  DeleBon’  opBon  will  require  you  to  enter  the  length  of  the  deleBon,  starBng   with  the  nucleoBde  where  the  cursor  is  posiBoned.  The  ‘Create  Genomic  SubsBtuBon’  feature   asks  for  the  string  of  nucleoBde  residues  that  will  replace  the  ones  on  the  DNA  track.   4.  Once  you  have  entered  the  modificaBons,  Web  Apollo  will  recalculate  the  corrected  transcript   and  protein  sequences,  which  will  appear  when  you  use  the  right-­‐click  menu  ‘Get  Sequence’   opBon.  Since  the  underlying  genomic  sequence  is  reflected  in  all  annotaBons  that  include  the   modified  region  you  should  alert  the  curators  of  your  organisms  database  using  the   ‘Comments’  secBon  to  report  the  CDS  edits.     5.  In  special  cases  such  as  selenocysteine  containing  proteins  (read-­‐throughs),  right-­‐click  over  the   offending/premature  ‘Stop’  signal  and  choose  the  ‘Set  readthrough  stop  codon’  opBon  from   the  menu.    57 | 57 COMPLEX CASES frameshifts, single-base errors, and selenocysteines Becoming Acquainted with Web Apollo.
  • 58. Follow  our  checklist  unBl  you  are  happy  with  the  annotaBon!   Then:   –  Comment  to  validate  your  annotaBon,  even  if  you  made  no   changes  to  an  exisBng  model.  Your  comments  mean  you  looked   at  the  curated  model  and  are  happy  with  it;  think  of  it  as  a  vote   of  confidence.   –  Or  add  a  comment  to  inform  the  community  of  unresolved   issues  you  think  this  model  may  have.   58 | 58 Always  Remember:  Web  Apollo  curaBon  is  a  community  effort  so   please  use  comments  to  communicate  the  reasons  for  your     annotaBon  (your  comments  will  be  visible  to  everyone).   COMPLETING THE ANNOTATION Becoming Acquainted with Web Apollo.
  • 59. To  find  the  gene  region  you  wish  to  annotate,  you  may  use:   a)  a  protein  sequence  of  a  homolog  from  another  species   b)  a  sequence  from  a  similar  gene  in  species  of  interest  (e.g.  another  gene  family  member)   c)  on  your  own,  you  aligned  your  gene  models  or  transcriptomic  data  to  the  genome.   d)  you  used  high  quality  proteins  and/or  gene  family  alignments  (mulB  or  single  species)   and  are  able  to  idenBfy  conserved  domains.   OpDon  1  –  You  have  a  sequence  but  don’t  know  where  it  is  in  this  genome:   •  Use  BLAT  in  the  Apollo  window,  or  BLAST  at  NAL’s  i5k  BLAST  server,  available  at:  hNp://i5k.nal.usda.gov/blastn       •  You  may  also  use  other  tools  for  annotaBon  and  contribute  your  data  from  those  efforts.   OpDon  2  –  The  genome  has  already  been  annotated  with  your  sequences  and  you  have  a  gene   idenBfier  that  has  been  indexed  in  Apollo.     •  That  is,  you  know  where  to  look,  so  type  the  ID  in  the  Search  box  of  Apollo.   •  Apollo  autocompletes  using  a  case-­‐insensiBve  search  anchored  on  the  le^-­‐hand  side  of  the  word.  For   example  “HaGR”  will  show  all  “hagr”  objects  (up  to  30).   •  Choose  one  of  the  genes  and  click  “Go”.   •  You  can  do  that  with  Domains,  Alignments  or  Gene  names  provided  to  you  (if  they  have  been  indexed).   OpDon  3  –  Find  genes  based  on  funcBonal  ontology  terms  or  network  membership  idenBfiers.   Becoming Acquainted with Web Apollo. HOW TO BEGIN
  • 60. 1.  Select  the  chromosomal  region  of  interest,  e.g.  scaffold.   2.  Select  appropriate  evidence  tracks.   3.  Determine  whether  a  feature  in  an  exisBng  evidence  track  will  provide  a   reasonable  gene  model  to  start  working.   -­‐  If  yes:  select  and  drag  the  feature  to  the  ‘User-­‐created  AnnotaBons’   area,  creaDng  an  iniDal  gene  model.  If  necessary  use  ediBng  funcBons   to  adjust  the  gene  model.   -­‐  Nothing  available  to  you?  Let’s  have  a  talk.   4.  Check  your  edited  gene  model  for  integrity  and  accuracy  by  comparing  it   with  available  homologs.   60 | Always  remember:  when  annotaBng  gene  models  using  Apollo,  you   are  looking  at  a  ‘frozen’  version  of  the  genome  assembly  and  you  will   not  be  able  to  modify  the  assembly  itself.   60 Becoming Acquainted with Web Apollo. GENERAL PROCESS OF CURATION
  • 61. 61CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 pay attention to these details v  AnnotaDng  a  simple  case:  WHEN  “The  official  predicBon  is  correct,  or  nearly   correct,  assuming  that  no  aligned  data  extends  beyond  the  gene  model  and   if  so,  it  is  not  likely  to  be  coding  sequence,  and/or  the  gene  predicBon   matches  what  you  know  about  the  gene”:   a.  Can  you  add  UTRs?     b.  Check  exon  structures.   c.  Check  splice  sites:  …]5’-­‐GT/AG-­‐3’[…   d.  Check  ‘start’  and  ‘stop’  sites.   e.  Check  the  predicted  protein  product(s).   f.  If  the  protein  product  sBll  does  not  look  correct,  go  on  to  “AnnotaBng   more  complex  cases”.    
  • 62. 62CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 continued v  AddiDonal  funcDonality.  You  may  also  need  to  learn  how  to:   a.  Get  genomic  sequence     b.  Merge  exons     c.  Add/Delete  an  exon     d.  Create  an  exon  de  novo  (within  an  intron  or  outside  exisBng   annotaBons).   e.  Right/apple-­‐click  on  a  feature  to  get  feature  ID  and  addiBonal   informaBon     f.  Looking  up  homolog  descripBons  going  to  the  accession  web  page  at   UniProt/Swissprot    
  • 63. 63CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 continued v  AnnotaDng  more  complex  cases:     a.  Incomplete  annotaBon:  protein  integrity  checks,  indicate  gaps,  missing  5’   sequences  or  missing  3’  sequences.     b.  Merge  of  2  gene  predicBons  on  same  scaffold     c.  Merge  of  2  gene  predicBons  on  different  scaffolds  (uh-­‐oh!).   d.  Split  of  a  gene  predicBon     e.  Frameshi^s,  Selenocysteine,  single-­‐base  errors,  and  other  inconvenient   phenomena    
  • 64. 64CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 continued v  Adding  important  project  informaDon  in  the  form  of  Canned  and/or   Customized  Comments:   a.  NCBI  ID,  RefSeq  ID,  gene  symbol(s),  common  name(s),  synonyms,  top   BLAST  hits  (GenBank  IDs),  orthologs  with  species  names,  and  anything   else  you  can  think  of,  because  you  are  the  expert.   b.  Type  of  annotaBon  (e.g.:  whether  or  not  the  gene  model  was  changed)     c.  Data  source  (for  example  if  the  Fgeneshpp  predicted  gene  was  the   starBng  point  for  your  annotaBon)   d.  The  kinds  of  changes  you  made  to  the  gene  model,  e.g.:  split,  merge   e.  FuncBonal  descripBon   f.  Whether  you  would  like  for  your  MOD  curator  to  check  the  annotaBon   g.  Whether  part  of  your  gene  is  on  a  different  scaffold.  
  • 65. 1.  Can  you  add  UTRs  (e.g.:  via  RNA-­‐Seq)?   2.  Check  exon  structures   3.  Check  splice  sites:  most  splice  sites  display  these   residues  …]5’-­‐GT/AG-­‐3’[…   4.  Check  ‘Start’  and  ‘Stop’  sites   5.  Check  the  predicted  protein  product(s)   –  Align  it  against  relevant  genes/gene  family.   –  blastp  against  NCBI’s  RefSeq  or  nr   6.  If  the  protein  product  sBll  does  not  look  correct   then  check:   –  Are  there  gaps  in  the  genome?   –  Merge  of  2  gene  predicBons  on  the  same   scaffold   –  Merge  of  2  gene  predicBons  from  different   scaffolds     –  Split  a  gene  predicBon   –  Frameshi^s     –  error  in  the  genome  assembly?   –  Selenocysteine,  single-­‐base  errors,  and  other   inconvenient  phenomena   65 | 65 7.  Finalize  annotaBon  by  adding:   –  Important  project  informaBon  in  the  form  of   canned  and/or  customized  comments   –  IDs  from  GenBank  (via  DBXRef),  gene  symbol(s),   common  name(s),  synonyms,  top  BLAST  hits  (with   GenBank  IDs),  orthologs  with  species  names,  and   everything  else  you  can  think  of,  because  you  are   the  expert.   –  Whether  your  model  replaces  one  or  more  models   from  the  official  gene  set  (so  it  can  be  deleted).   –  The  kinds  of  changes  you  made  to  the  gene  model   of  interest,  if  any.  E.g.:  splits,  merges,  whether  the   5’  or  3’  ends  had  to  be  modified  to  include  ‘Start’  or   ‘Stop’  codons,  addiBonal  exons  had  to  be  added,  or   non-­‐canonical  splice  sites  were  accepted.   –  Any  funcBonal  assignments  that  you  think  are  of   interest  to  the  community  (e.g.  via  BLAST,  RNA-­‐Seq   data,  literature  searches,  etc.)   THE CHECK LIST for accuracy and integrity Becoming Acquainted with Web Apollo.
  • 67. Apollo Example -­‐  Introductory  demonstraBon  using  the  Hyalella  azteca  genome   (amphipod  crustacean).   Example 67 A  public  Apollo  Demo  using  the  Honey  Bee  genome  is  available  at     hNp://genomearchitect.org/WebApolloDemo  
  • 68. What do we know about this genome? •  Currently  publicly  available  data  at  NCBI:   •  >37,000    nucleoBde  seqsà  scaffolds,  mitochondrial  genes   •  300    amino  acid  seqsà  mitochondrion   •  53    ESTs   •  0      conserved  domains  idenBfied   •  0    “gene”  entries  submiNed     •  Data  at  i5K  Workspace@NAL   -­‐  10,832  scaffolds,  23,288  transcripts,  12,906  proteins   Example 68
  • 69. PubMed Search: 
 what’s new? Example 69
  • 70. PubMed Search: what’s new? Example 70 “Ten  populaBons  (3  laboratory  cultures,  7  California   water  bodies)  differed  by  at  least  550-­‐fold  in  sensiBvity   to  pyrethroids.”     “By  sequencing  the  primary  pyrethroid  target  site,  the   voltage-­‐gated  sodium  channel  (vgsc),  we  show  that   point  mutaBons  and  their  spread  in  natural  populaBons   were  responsible  for  differences  in  pyrethroid   sensiBvity.”   “The  finding  that  a  non-­‐target  aquaBc  species  has   acquired  resistance  to  pesBcides  used  only  on  terrestrial   pests  is  troubling  evidence  of  the  impact  of  chronic   pesBcide  transport  from  land-­‐based  applicaBons  into   aquaBc  systems.”  
  • 71. How many sequences for our gene of interest? Example 71 •  Para,  (voltage-­‐gated  sodium  channel  alpha   subunit;  Nasonia  vitripennis).     •  NaCP60E  (Sodium  channel  protein  60  E;  D.   melanogaster).   •  MF:  voltage-­‐gated  caBon  channel  acBvity   (IDA,  GO:0022843).   •  BP:  olfactory  behavior  (IMP,  GO:0042048),   sodium  ion  transmembrane  transport   (ISS,GO:0035725).   •  CC:  voltage-­‐gated  sodium  channel  complex   (IEA,  GO:0001518).   And  what  do  we  know  about  them?  
  • 72. BLAST at i5K 
 https://i5k.nal.usda.gov/blast Example 72 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 73. BLAST at i5K 
 https://i5k.nal.usda.gov/blast Example 73 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 74. BLAST at i5K 
 https://i5k.nal.usda.gov/blast Example 74
  • 75. BLAST at i5K: 
 high-scoring segment pairs (hsp) in “BLAST+ Results” track Example 75
  • 77. Creating a new gene model: drag and drop Example 77 •  Web Apollo automatically calculates the longest open reading frame (ORF). In this case, the ORF includes the high-scoring segment pairs (hsp).
  • 79. Flanking sequences (other gene models) vs. NCBI nr Example 79 In  this  case,  two  gene   models  at  5’  end.  
  • 80. Review alignments Example 80 HaztTmpM006232   HaztTmpM006233   HaztTmpM006234  
  • 81. Hypothesis for vgsc gene model Example 81
  • 82. Editing: merge Example 82 Merge  by  dropping  an   exon  or  gene  model   onto  another.   Merge  by  selecBng   two  exons  (holding   down  “Shi^”)  and   using  the  right  click   menu.  
  • 83. Editing: correct boundaries, delete exons Example 83 Modify  exon  /  intron   boundary  by  dragging   the  end  of  the  exon  to   the  nearest  canonical   splice  site.   Delete  first  exon  from   M006233  
  • 84. Editing: set translation start, modify boundary Example 84 Set  translaBon  start   Modify  intron  /   exon  boundary   (here  and  at   coord.  78,999)  
  • 85. Finished model Example 85 Corroborate  integrity  and  accuracy  of  the  model:     -­‐  Start  and  Stop   -­‐  Exon  structure  and  splice  sites  …]5’-­‐GT/AG-­‐3’[…   -­‐  Check  the  predicted  protein  product  on  NCBI  nr  
  • 86. Information Editor •  DBXRefs:  e.g.  NP_001128389.1,  N.   vitripennis,  RefSeq   •  PubMed  idenBfier:  PMID:  24065824   •  Gene  Ontology  IDs:  GO:0022843,  GO: 0042048,  GO:0035725,  GO:0001518.   •  Comments.   •  Name,  Symbol.     •  Approve  /  Delete  radio  buNon.   Example 86 Comments   (if  applicable)  
  • 88. APOLLO
 demonstration DEMO 88 See  Apollo  DemonstraBon  Video  at:   hNps://youtu.be/VgPtAP_fvxY      
  • 89. Exercises Live  DemonstraBon  using  the  Apis  mellifera  genome.   89 1.  Evidence  in  support  of  protein  coding  gene   models.       1.1  Consensus  Gene  Sets:   Official  Gene  Set  v3.2   Official  Gene  Set  v1.0     1.2  Consensus  Gene  Sets  comparison:   OGSv3.2  genes  that  merge  OGSv1.0  and   RefSeq  genes   OGSv3.2  genes  that  split  OGSv1.0  and  RefSeq   genes     1.3  Protein  Coding  Gene  PredicDons  Supported  by   Biological  Evidence:   NCBI  Gnomon   Fgenesh++  with  RNASeq  training  data   Fgenesh++  without  RNASeq  training  data   NCBI  RefSeq  Protein  Coding  Genes  and  Low  Quality   Protein  Coding  Genes   1.4  Ab  ini&o  protein  coding  gene  predicDons:   Augustus  Set  12,  Augustus  Set  9,  Fgenesh,  GeneID,   N-­‐SCAN,  SGP2     1.5  Transcript  Sequence  Alignment:   NCBI  ESTs,  Apis  cerana  RNA-­‐Seq,  Forager  Bee  Brain   Illumina  ConBgs,  Nurse  Bee  Brain  Illumina  ConBgs,   Forager  RNA-­‐Seq  reads,  Nurse  RNA-­‐Seq  reads,   Abdomen  454  ConBgs,  Brain  and  Ovary  454   ConBgs,  Embryo  454  ConBgs,  Larvae  454  ConBgs,   Mixed  Antennae  454  ConBgs,  Ovary  454  ConBgs   Testes  454  ConBgs,  Forager  RNA-­‐Seq  HeatMap,   Forager  RNA-­‐Seq  XY  Plot,  Nurse  RNA-­‐Seq   HeatMap,  Nurse  RNA-­‐Seq  XY  Plot     Becoming Acquainted with Web Apollo.
  • 90. Exercises (continued) Live  DemonstraBon  using  the  Apis  mellifera  genome.   90 1.  Evidence  in  support  of  protein  coding  gene   models  (ConDnued).     1.6  Protein  homolog  alignment:   Acep_OGSv1.2   Aech_OGSv3.8   Cflo_OGSv3.3   Dmel_r5.42   Hsal_OGSv3.3   Lhum_OGSv1.2   Nvit_OGSv1.2   Nvit_OGSv2.0   Pbar_OGSv1.2   Sinv_OGSv2.2.3   Znev_OGSv2.1   Metazoa_Swissprot       2.  Evidence  in  support  of  non  protein  coding  gene   models     2.1  Non-­‐protein  coding  gene  predicDons:   NCBI  RefSeq  Noncoding  RNA   NCBI  RefSeq  miRNA     2.2  Pseudogene  predicDons:   NCBI  RefSeq  Pseudogene   Becoming Acquainted with Web Apollo.
  • 91. Web Apollo Workshop Instances
 Demo  1:  hNp://genomes.missouri.edu:8080/Amel_4.5_demo_1         Demo  2:  hNp://genomes.missouri.edu:8080/Amel_4.5_demo_2       Workshop  DocumentaBon  can  be  found  at:   Basecamp     Web  Apollo  instance  for  Diaphorina  citri     hNps://apollo.nal.usda.gov/diacit/selectTrack.jsp     Register  for  i5K  Workspace@NAL  at:   hNps://i5k.nal.usda.gov/web-­‐apollo-­‐registraBon  
  • 92. FUTURE PLANS
 interactive analysis and curation of variants v  InteracBve  exploraBon  of  VCF   files  (e.g.  from  GATK,  VAAST)  in   addiBon  to  BAM  and  GVF.     MulBple  tracks  in  one:   visualizaBon  of  geneBc  alteraBons   and  populaBon  frequency  of   variants.   WEB APOLLO 92 1   1   2   v  Clinical  applicaBons:  analysis  of   Copy  Number  VariaBons  for   regulatory  effects;  overlaying   display  of  the  regulatory  domains.   Philips-­‐Creminis  and  Corces.  2013.  Cell  50  (4):461-­‐474   2   TADs:  topologically  associaBng  domains  
  • 93. FUTURE PLANS
 educational tools We  are  working  with  educators  to  make  Web  Apollo  part  of  their  curricula.   WEB APOLLO 93 Lecture  Series.   In  the  classroom.   At  the  lab.   Classroom  exercises:  from   genome  sequence  to   hypothesis.   CuraBon  group  dedicated   to  producing  educaBon   materials  for  non-­‐model   organism  communiBes.   Our  team  provides  online   documentaBon,  hands-­‐on   training,  and  rapid   response  to  users.  
  • 94. JOIN US
 Footer 94 http://GenomeArchitect.org/ Please bring your suggestions, requests, and contributions to: Nathan Dunn Apollo Technical Lead Eric Yao JBrowse, UC Berkeley Deepak Unni Colin Diesh Apollo Developers Elsik Lab, University of Missouri Suzi Lewis Principal Investigator Berkeley  BOP  
  • 95. •  Berkeley  BioinformaDcs  Open-­‐source  Projects  (BBOP),   Berkeley  Lab:  Web  Apollo  and  Gene  Ontology  teams.   Suzanna  E.  Lewis  (PI).   •  §  Chris&ne  G.  Elsik  (PI).  University  of  Missouri.     •  *  Ian  Holmes  (PI).  University  of  California  Berkeley.   •  Arthropod  genomics  community:  i5K  Steering   CommiNee  (esp.  Sue  Brown  (Kansas  State)),  Alexie   Papanicolaou  (UWS),  BGI,  Oliver  Niehuis  (1KITE   hNp://www.1kite.org/),  and  the  Honey  Bee  Genome   Sequencing  ConsorBum.   •  Apollo  is  supported  by  NIH  grants  5R01GM080203  from   NIGMS,  and  5R01HG004483  from  NHGRI;  by  Contract   No.  60-­‐8260-­‐4-­‐005  from  the  NaBonal  Agricultural  Library   (NAL)  at  the  United  States  Department  of  Agriculture   (USDA);  and  by  the  Director,  Office  of  Science,  Office  of   Basic  Energy  Sciences,  of  the  U.S.  Department  of  Energy   under  Contract  No.  DE-­‐AC02-­‐05CH11231.   •  Insect  images  used  with  permission:   hNp://AlexanderWild.com   •  For  your  aQenDon,  thank  you!   Thank you. 95 Web  Apollo   Nathan  Dunn   Colin  Diesh  §   Deepak  Unni  §       Gene  Ontology   Chris  Mungall   Seth  Carbon   Heiko  Dietze     BBOP   Web  Apollo:  hNp://GenomeArchitect.org     i5K:  hNp://arthropodgenomes.org/wiki/i5K   GO:  hNp://GeneOntology.org   Thanks!   NAL  at  USDA   Monica  Poelchau   Christopher  Childers   Gary  Moore   HGSC  at  BCM   fringy  Richards   Dan  Hughes   Kim  Worley     JBrowse          Eric  Yao  *