SlideShare a Scribd company logo
1 of 37
Download to read offline
Dealing with the Data Deluge:
   What can the Robotics
   Community Teach us?
     Making our pipelines organic,
       adaptable, and scalable

            Darin London
Part I. The Challenges of
NextGen Sequencing Data
Datasets

 50+ Cell Lines
 Each sequenced with up to 2 different technologies
 (DNaseHS and FAIRE) and 3 different ChIP-Seq antibodies
 (CTCF, PolII, c-Myc), as well as a Control (Input) for
 comparison
 Most involved multiple biological replicates, and some
 biological replicates were sequenced multiple times to
 create technical replicates of the same biological sample
 1.3 Gb zipped raw data per Cell_line-Technology-Replicate
 on average
 351 Gb zipped raw sequence data analyzed (and
 counting...)
Some characteristics of NextGen
Sequencing Data
 heterogeneous in time:
    comes in batches by lane and sample
    order of date of sample submission does not fix the
    order of date of receipt of data
 heterogeneous in size:
    some samples will produce more data than others
    size affects timing of most computational tasks
 heterogeneous in quality:
    some data will not merit being run through the entire
    pipeline
    some data may merit extra analysis
Part II. A Tale of Two Robots
Meet Shakey

http://www.ai.sri.com/movies/Shakey.ram

   The first fully autonomous robot able to reason about its
   surroundings
   Pioneered many algorithms to model data from multiple
   sensors into a central world map, apply one or more plans
   of action, and determine appropriate action behaviors to
   achieve these plans
    If science is the 'Art of the Soluble' then Shakey
   demonstrated the solubility of autonomous robotics to the
   world.
That being said...




                     The autonomous systems roving
                     on mars, fighting in Afghanistan,
                     and cleaning our floors do not
                     share much in common with
                     Shakey.
These systems descend from more practical approaches
pioneered in the 1980s by Rodney Brooks and others




                         In 1986, he introduced the world
                         to Allen, a Behavior-based robot
                         based on the Subsumption
                         Architecture
Behavior-based Robots
  Attempt to mimic biological actions, rather than human
  cognition
  Built out of many small modules
  Modules act autonomously by continuously sensing
  the environment for specific signals, and immediately
  perform a specific action based on that sensory input
  Modules arranged hierarchically, with higher layer
  modules able to mask (subsume) the input or output
  of lower layer modules (lower layer modules are not
  aware that they are being subsumed)
  There is no central planning module
  The intelligence of the system is completely
  distributed throughout all the smaller subsystems,
  each designed to achieve certain parts of the overall
  task list opportunistically as the environment becomes
  favorable to it acting as it is designed to act
Cost-Benefit Analysis
 Benefits of Behavior-based robots over AI:
    Easier and cheaper to build
    Scale better with existing technology
    More easily adaptable, new behaviors emerge with the
    addition of modules with little or no change to other
    modules
    More fault tolerant, partial behaviors tend to persist even
    when many modules fail to act
 Deficiencies of Behavior-based robots:
    'Higher order' reasoning and logic functions are too
    complex
    No capacity to learn from mistakes except through
    changes, addition, or subtraction of modules
Part III. Making our
Bioinformatics Pipelines
Organic, and Adaptable
Many Bioinformatics Pipelines
resemble Shakey by:
  Involving centralized controller systems which control every
  aspect of pipeline behavior
  Mixing the logic for selecting tasks from a list together with
  the logic for performing these tasks
Except that, unlike Shakey, many
pipelines:
  Have little or no knowledge of their computing environment
  Have no, or very little, capacity to:
     perform tasks in different orders, opportunistically
     temporarily re-focus their work on smaller subsets of the
     total task list
     run tasks in parallel
     etc.
  Lack intelligent points for human agent inclusion
  Are subject to human will at every level
Behavior Based Pipelines are
     ideal for dealing with
heterogeneous data efficiently
They are Modular
 Much like Object Oriented Programming
 Failure is easy to diagnose and fix
 Failure in one module does not (necessarily) impact other
 module actions
 Failure in one module does not (necessarily) require other
 modules to be rerun, or require complex skipping logic in
 the pipeline code
They are Adaptable

 New analyses should simply require plugging in a new
 module, with minimal or no 'rewiring' of other modules
 Reanalyses should simply require the removal of certain
 outputs, and possibly a reset of the completion state of a
 particular task to accomplish, and all downstream tasks
 should either react to the presence of new data, or require
 minimal state manipulation to get them to rerun themselves
 Modules can be augmented, or replaced as needed, with
 little or no change to other modules, as long as their original
 functionality is maintained or assumed by another module
They are Scalable

 Modules can be deployed onto as many different machines
 as are available (servers, nodes on a cluster, nodes in a
 cloud) to expand throughput
 Modules with high resource requirements can be deployed
 onto separate machines from those with low resource
 requirements
 Modules can be grouped together on different machines, or
 sets of machines, according to functionality, or data
 proximity
They act Autonomously

 Individual modules can 'react' to data to produce information
 as soon as the data is made available in the 'environment'
 Datasets can be moved through the pipeline at different
 rates
 Modules do not require humans to manage them, but,
 instead, react and respond to different human inputs at
 many different places
 Humans are really just another intelligent agent in the
 system
They can act Opportunistically

  Modules can be tied into multiple task-management
  systems
     overall dataset-task list
     priority dataset-task list
     machine specific dataset-task lists
     manual intervention
  The priority system can be set to take precedence over the
  overall system, but if priority datasets get backlogged, the
  system can still opportunistically process items in the overall
  system until the backlog is cleared, and the priority system
  can then regain the focus of some or all machines in the
  system
They are sensitive to their computing
environment, and knowledgeable of the
resources they need to work
 Modules should know how much memory, file system
 space, etc. they need
 Modules should know about other modules that would
 compete with them for scarce resources
 This may run counter to the ethos of platform nutrality, but,
 for instance (if you are running on redhat/centos) you can
 parse /proc/meminfo for memory information (my $meminfo
 = YAML::LoadFile('/proc/meminfo')), ps for information on
 other processes running in the environment, df for
 filesystem information, etc.
These systems have other advantages:
 They make it easy to get up and running with 1-2 modules
 tested on a small dataset, which can then be applied to all
 other datasets available, and yet to come
 They allow for 'partial solutions', e.g. some data will always
 be produced even if the entire pipeline is not finished (what
 pipeline is ever 'finished', anyway), or if one or more parts of
 the pipeline are discovered to have bugs
 New modules can be created, tested against 1 or more
 datasets, and then 'released to the wild' so that they can
 autonomously fill in the gaps for all previously received data,
 and then analyze all data received in the future
 Buggy modules can be pulled out of the pipeline, fixed and
 tested in the same way
Part IV. The IGSP Encode
          Pipeline
Pipeline designed to generate data for
the Encyclopedia of DNA Elements
(EncODE)
http://www.genome.gov/10005107

For both EncODE and non EncODE cell_lines and treatments:
    Automates movement of data from sequencing staging to IGSP server
    Aligns raw sequence files to hg19 using bwa (previously hg18 using
    maq)
    Generates feature density distributions of whole-genome sequence data
    aligned to hg19
    Generates visual tracks of data in the IGSP internal UCSC Genome
    Browser
    Generates submission tarballs of bam, peaks, parzen bigWigs, and
    base count bigWigs to be submitted to UCSC
Compute Infrastructure
 4 Centos Compute Nodes: 8 core (2.50 Ghz, dual quad core
 procs), 32GB 1066Mhz Ram, Primary 120GB HDD,
 Secondary 250GB HDD
 Duke Shared Cluster Resource: 19 high priority Encode
 nodes, each with 8 cores and 16 GB Ram
 Compute nodes connected to DSCR via NFS mounted
 volume provided by a Netapp NAS array of 42 15k 450GB
 FC disks exported through a 10G Fibre-E link
 Raw Data, and analytical output stored on two NFS
 mounted volumes provided by a Netapp NAS array of 14
 7.2k sata disks, 1TB and 750G in size
 Each compute node contains its own, locally mounted 230G
 scratch directory to minimize NFS read-write concurrency
 issues
Pipeline composed of many different
agents, each falling into one of three
categories:
 Runner Agents: These simply read through a list of datasets and
 tasks to be done on each dataset, and launch the necessary
 processing agents required to accomplish each task on the
 dataset. They do not care whether it is possible for the agent to
 accomplish the task on the dataset
 Processing Agents: These are small programs designed to perform
 a specific processing task on a given dataset. In addition, they are
 designed to know when it is possible to perform the task (based on
 prerequisites), whether the resources (memory, storage space,
 etc) required for it to run are available, and whether other
 programs which are running on the system will compete with it in
 ways which adversely effect its performance
Main Task List

Composed of a set of worksheets in a Google
Spreadsheet. This has a number of advantages:
   Allows people all over the world to keep track of what has
   been done, and what remains to be done
   Since the Google Spreadsheet API is also available to
   agents on any internet connected computer, it can be used
   by runner and processing agents on any number of servers
The third type of agents in this system
are humans
The google spreadsheet model makes it very easy to plug
humans into the overall logic of the system:
   arguments, variables, and state switches can be
   communicated to an agent using meta-fields on the
   worksheet. The values for these fields can be filled in by
   humans, or other computer agents
   processing agents can be coded to require prerequisite
   meta-fields which require a human to switch on before they
   run
   processing agents can write data to information fields upon
   completion, failure, or both. This might include changing the
   state of prerequisite fields required by other agents
   processes requiring human intervention can be replaced by
   computational logic over time, as the logic becomes
   formalized into one or more agents
Part V.
      Google::Spreadsheet::Agent
http://search.cpan.org/~dmlond/Google-Spreadsheet-Agent-0.01
#!/usr/bin/perl
use strict;
use Getopt::Std;
use Google::Spreadsheet::Agent;
# usually other modules are used

my $goal = basename($0);
$goal =~ s/_agent.pl//;

my $cell_line = shift or die "cell_linen";
my $technology = shift or die "technologyn";
my $replicate = shift or die "replicaten";
my $google_page = ($replicate =~ m/.*_TP.*/) ? 'combined' : $technology;

my %opts;
getopts('dr:P:', %opts);
my $debug = $opts{d};
$data_root = $opts{r} if ($opts{r})
$google_page = $opts{P} if ($opts{P});

my $prerequisites = [];
$prerequisites->[0] = ($replicate =~ m/.*_TP.*/) ? 'combined' : 'aligned';

my $google_agent = Google::Spreadsheet::Agent->new(
      agent_name => $goal, page_name => $google_page, debug => $debug,
      max_selves => 3,
      bind_key_fields => { cellline => $cell_line, technology => $technology, replicate => $replicate },
      prerequisites => $prerequisites
);
$google_agent->run_my(&agent_code);
exit;
my $min_gigs = 18; # start with an 18G /scratch2 availability
requirement
my $gigs_avail = &get_scratch_availability or exit(1);
exit if ($gigs_avail < $min_gigs);

sub get_scratch_availability {
  my $opened = open (my $df_in, '-|', 'df', '-h', '/scratch2');
  unless ($opened) {
     print STDERR "Couldnt check scratch2 usage $!n";
     return;
  }
  my $in = <$df_in>; # skip first line
  $in = <$df_in>;
  chomp $in;
  close $df_in;
  my $gigs_avail = (split /s+/, $in)[3];
  $gigs_avail =~ s/D+$//;
  return $gigs_avail;
}

use YAML::Any qw/LoadFile/;
my $min_mem = 16; # requires about 16-18G memory to run
exit if (&get_available_memory <= $min_mem);
sub get_available_memory {
  my $info = LoadFile('/proc/meminfo') or die "Couldnt load meminfo $!n";
  my $free_mem = $info->{MemFree};
  $free_mem =~ s/D+$//;
  my $buffers = $info->{Buffers};
  my $cached = $info->{Cached};
  $buffers =~ s/D+$//;
  $cached =~ s/D+$//;
  $free_mem += $buffers + $cached;
  $free_mem /= (1024*1024);
  return $free_mem;
}
sub agent_code {
  my $entry = shift;
  my $replicate_root = join('/', $data_root, $cell_line, $technology, 'sequence_'.$replicate);
  my $db_name = getDBName($replicate_root);
  my $scratch_root = $replicate_root;
  $scratch_root =~ s/$data_root//scratch2/;

    my $helper_command = join(' ', join('/', $generic_apps_dir, 'parzen_fseq_helper.pl'),
                $replicate_root, join('/', $replicate_root, 'bwa_'.$entry->{build}, 'sequence.final.bed'),
                $cell_line, $technology, $entry->{sex}, $entry->{build}, $db_name
    );

    print STDERR "Running ${helper_command}n";
    `$helper_command`;
    if ($?) {
        print "Problem running parzen_helper $!";
        return;
    }

    my $parzen_track_name = $db_name . "_parzen";
    my $scratch_parzen_dir = join('/', $scratch_root, 'parzen_'.$build);
    my $parzen_dir = join('/', $replicate_roo 'parzen_'.$build);
    $parzen_dir =~ s/sata2/sata4/;

    my $wiggle_helper = join(' ', join('/', $generic_apps_dir, 'parzen_wiggle_helper.pl'),
               $build, $parzen_track_name, $parzen_dir, $scratch_parzen_dir
    );

    print STDERR "Running ${wiggle_helper}n";
    `$wiggle_helper`;
     if ($?) {
        print STDERR "Problem running wiggle_helper $!n";
        return;
    }
    return 1;
}
#!/usr/bin/perl
use FindBin;
use Google::Spreadsheet::Agent;

my $google_agent = Google::Spreadsheet::Agent->new(
         agent_name => 'agent_runner',
         page_name => 'all',
         bind_key_fields => { cellline => 'all', technology => 'all', replicate => 'all' }
);

# iterate through each page on the database, get runnable rows, and run each runnable on the row
foreach my $page_name ( map { $_->title } $google_agent->google_db->worksheets ) {
   foreach my $runnable_row (
        grep {
             $_->content->{ready} && !$_->content->{complete}
         } $google_agent->google_db->worksheet({ title => $page_name })->rows
    ){
        foreach my $goal (keys %{$runnable_row->content}) {
           next if ($runnable_row->content->{$goal}; # r,1,F cause it to skip

            # some of these will skip because they are fields without agents
            my $goal_agent = $FindBin::Bin.'/../agent_bin/'.$goal.'_agent.pl';
            return unless (-x $goal_agent);

             my @cmd = ($goal_agent);
             foreach my $query_field ( sort {
                $google_agent->config->{key_fields}->{$a}->{rank} <=> $google_agent->config->{key_fields}->{$b}->{rank}
               } keys %{$google_agent->config->{key_fields}} ) {
                   next unless ($row_content->{$query_field});
                   push @cmd, $row_content->{$query_field};
             }
            system( join(' ', @cmd).'&');
            sleep 5;
        }
   }
}
exit;
Future Plans
1. Making inter-lab communication more concrete, automatic
2. Each server can have its own 'task' view of a particular
   google spreadsheet worksheet, in that it can have its own
   unique set of executable agent_bin scripts tied to a set of
   fields that systems on other servers would ignore
3. Put some of the runner code, and requirements checking
   routines into Google::Spreadsheet::Agent for version 1.1
Acknowledgements

The Institute for Genome Sciences and Policy (IGSP)
The Encode Consortium
Terry Furey
Alan Boyle
Greg Crawford
Mark DeLong
Rob Wagner
Peyton Vaughn
Darrin Mann
Alan Cowles

More Related Content

What's hot

How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformaticianChristian Frech
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File SystemVishal Polley
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersIRJET Journal
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331Fengchang Xie
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryZongYing Lyu
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemZongYing Lyu
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesVarad Meru
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemsrikanthhadoop
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategySaptarshi Chatterjee
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemMoeez Ahmad
 

What's hot (20)

Taming Snakemake
Taming SnakemakeTaming Snakemake
Taming Snakemake
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Hadoop
HadoopHadoop
Hadoop
 
Kosmos Filesystem
Kosmos FilesystemKosmos Filesystem
Kosmos Filesystem
 
Google File System
Google File SystemGoogle File System
Google File System
 
Hadoop
HadoopHadoop
Hadoop
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using Containers
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memory
 
Hadoop
HadoopHadoop
Hadoop
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory system
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 

Viewers also liked

Goozzy presentation for Venture Summit East 2010
Goozzy presentation for Venture Summit East 2010Goozzy presentation for Venture Summit East 2010
Goozzy presentation for Venture Summit East 2010alarin
 
Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016anoukkonQompas
 
iPad integration through a differentiation lens
iPad integration through a differentiation lensiPad integration through a differentiation lens
iPad integration through a differentiation lensKevin Amboe
 
Chap013 sales management
Chap013 sales managementChap013 sales management
Chap013 sales managementHee Young Shin
 
Empúries
EmpúriesEmpúries
Empúriesrnota
 
Camera buying guidelines
Camera buying guidelinesCamera buying guidelines
Camera buying guidelinesThomas Klose
 
What is ineighbourtv?
What is ineighbourtv?What is ineighbourtv?
What is ineighbourtv?Social iTV
 
Shannon Smith Cv 201109
Shannon Smith Cv 201109Shannon Smith Cv 201109
Shannon Smith Cv 201109shagsa
 
C:\Fakepath\消費者行動論(小松崎班)
C:\Fakepath\消費者行動論(小松崎班)C:\Fakepath\消費者行動論(小松崎班)
C:\Fakepath\消費者行動論(小松崎班)yahohsoaho
 
iTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad LearningiTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad LearningKevin Amboe
 
ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...
ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...
ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...SPB SQA Group
 
Progress Of Interoperability
Progress Of InteroperabilityProgress Of Interoperability
Progress Of Interoperabilityrobtepas
 
Скрипт Каталог товаров - Модуль Catalog
Скрипт Каталог товаров - Модуль CatalogСкрипт Каталог товаров - Модуль Catalog
Скрипт Каталог товаров - Модуль CatalogАльберт Коррч
 
Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...
Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...
Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...Cary Institute of Ecosystem Studies
 
Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...Dimitri Corpakis
 
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!SPB SQA Group
 
Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...
Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...
Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...Bai Euskarari Ziurtagiriaren Elkartea
 
Twitter e suas APIs de Streaming - Campus Party Brasil 7
Twitter e suas APIs de Streaming - Campus Party Brasil 7Twitter e suas APIs de Streaming - Campus Party Brasil 7
Twitter e suas APIs de Streaming - Campus Party Brasil 7Luis Cipriani
 

Viewers also liked (20)

Goozzy presentation for Venture Summit East 2010
Goozzy presentation for Venture Summit East 2010Goozzy presentation for Venture Summit East 2010
Goozzy presentation for Venture Summit East 2010
 
United Way of Greater Toledo SEM Presentation
United Way of Greater Toledo SEM PresentationUnited Way of Greater Toledo SEM Presentation
United Way of Greater Toledo SEM Presentation
 
Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016
 
iPad integration through a differentiation lens
iPad integration through a differentiation lensiPad integration through a differentiation lens
iPad integration through a differentiation lens
 
Chap013 sales management
Chap013 sales managementChap013 sales management
Chap013 sales management
 
Empúries
EmpúriesEmpúries
Empúries
 
Camera buying guidelines
Camera buying guidelinesCamera buying guidelines
Camera buying guidelines
 
What is ineighbourtv?
What is ineighbourtv?What is ineighbourtv?
What is ineighbourtv?
 
Shannon Smith Cv 201109
Shannon Smith Cv 201109Shannon Smith Cv 201109
Shannon Smith Cv 201109
 
C:\Fakepath\消費者行動論(小松崎班)
C:\Fakepath\消費者行動論(小松崎班)C:\Fakepath\消費者行動論(小松崎班)
C:\Fakepath\消費者行動論(小松崎班)
 
iTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad LearningiTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad Learning
 
ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...
ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...
ITGM8. Сергей Атрощенков (Еpam) Buzzword driven development и место тестировщ...
 
Progress Of Interoperability
Progress Of InteroperabilityProgress Of Interoperability
Progress Of Interoperability
 
Invasive Species and Water Resources
Invasive Species and Water ResourcesInvasive Species and Water Resources
Invasive Species and Water Resources
 
Скрипт Каталог товаров - Модуль Catalog
Скрипт Каталог товаров - Модуль CatalogСкрипт Каталог товаров - Модуль Catalog
Скрипт Каталог товаров - Модуль Catalog
 
Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...
Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...
Utilizing a Terrestrial Invasive Species Rapid Response Team in the Adirondac...
 
Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...
 
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
 
Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...
Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...
Manual de prácticas ejemplares en euskara / Une pratique adéquate et exemplai...
 
Twitter e suas APIs de Streaming - Campus Party Brasil 7
Twitter e suas APIs de Streaming - Campus Party Brasil 7Twitter e suas APIs de Streaming - Campus Party Brasil 7
Twitter e suas APIs de Streaming - Campus Party Brasil 7
 

Similar to London bosc2010

CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit INANDINI SHARMA
 
01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems01 - Introduction to Distributed Systems
01 - Introduction to Distributed SystemsDilum Bandara
 
From Simulation to Online Gaming: the need for adaptive solutions
From Simulation to Online Gaming: the need for adaptive solutions From Simulation to Online Gaming: the need for adaptive solutions
From Simulation to Online Gaming: the need for adaptive solutions Gabriele D'Angelo
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)Dinesh Modak
 
DISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docxDISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docxvinaypandey170
 
Dynamic Load Calculation in A Distributed System using centralized approach
Dynamic Load Calculation in A Distributed System using centralized approachDynamic Load Calculation in A Distributed System using centralized approach
Dynamic Load Calculation in A Distributed System using centralized approachIJARIIT
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3Diane Allen
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...Soumya Banerjee
 

Similar to London bosc2010 (20)

Cluster computing
Cluster computingCluster computing
Cluster computing
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit I
 
01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems
 
Wiki 2
Wiki 2Wiki 2
Wiki 2
 
From Simulation to Online Gaming: the need for adaptive solutions
From Simulation to Online Gaming: the need for adaptive solutions From Simulation to Online Gaming: the need for adaptive solutions
From Simulation to Online Gaming: the need for adaptive solutions
 
MSB-Distributed systems goals
MSB-Distributed systems goalsMSB-Distributed systems goals
MSB-Distributed systems goals
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
Chapter 3 chapter reading task
Chapter 3 chapter reading taskChapter 3 chapter reading task
Chapter 3 chapter reading task
 
Clusters
ClustersClusters
Clusters
 
DISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docxDISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docx
 
Dynamic Load Calculation in A Distributed System using centralized approach
Dynamic Load Calculation in A Distributed System using centralized approachDynamic Load Calculation in A Distributed System using centralized approach
Dynamic Load Calculation in A Distributed System using centralized approach
 
4.Process.ppt
4.Process.ppt4.Process.ppt
4.Process.ppt
 
OS .pptx
OS .pptxOS .pptx
OS .pptx
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3
 
Wk6a
Wk6aWk6a
Wk6a
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Dos unit3
Dos unit3Dos unit3
Dos unit3
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
 

More from BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 

More from BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

London bosc2010

  • 1. Dealing with the Data Deluge: What can the Robotics Community Teach us? Making our pipelines organic, adaptable, and scalable Darin London
  • 2. Part I. The Challenges of NextGen Sequencing Data
  • 3. Datasets 50+ Cell Lines Each sequenced with up to 2 different technologies (DNaseHS and FAIRE) and 3 different ChIP-Seq antibodies (CTCF, PolII, c-Myc), as well as a Control (Input) for comparison Most involved multiple biological replicates, and some biological replicates were sequenced multiple times to create technical replicates of the same biological sample 1.3 Gb zipped raw data per Cell_line-Technology-Replicate on average 351 Gb zipped raw sequence data analyzed (and counting...)
  • 4.
  • 5. Some characteristics of NextGen Sequencing Data heterogeneous in time: comes in batches by lane and sample order of date of sample submission does not fix the order of date of receipt of data heterogeneous in size: some samples will produce more data than others size affects timing of most computational tasks heterogeneous in quality: some data will not merit being run through the entire pipeline some data may merit extra analysis
  • 6.
  • 7.
  • 8. Part II. A Tale of Two Robots
  • 9. Meet Shakey http://www.ai.sri.com/movies/Shakey.ram The first fully autonomous robot able to reason about its surroundings Pioneered many algorithms to model data from multiple sensors into a central world map, apply one or more plans of action, and determine appropriate action behaviors to achieve these plans If science is the 'Art of the Soluble' then Shakey demonstrated the solubility of autonomous robotics to the world.
  • 10. That being said... The autonomous systems roving on mars, fighting in Afghanistan, and cleaning our floors do not share much in common with Shakey.
  • 11. These systems descend from more practical approaches pioneered in the 1980s by Rodney Brooks and others In 1986, he introduced the world to Allen, a Behavior-based robot based on the Subsumption Architecture
  • 12. Behavior-based Robots Attempt to mimic biological actions, rather than human cognition Built out of many small modules Modules act autonomously by continuously sensing the environment for specific signals, and immediately perform a specific action based on that sensory input Modules arranged hierarchically, with higher layer modules able to mask (subsume) the input or output of lower layer modules (lower layer modules are not aware that they are being subsumed) There is no central planning module The intelligence of the system is completely distributed throughout all the smaller subsystems, each designed to achieve certain parts of the overall task list opportunistically as the environment becomes favorable to it acting as it is designed to act
  • 13. Cost-Benefit Analysis Benefits of Behavior-based robots over AI: Easier and cheaper to build Scale better with existing technology More easily adaptable, new behaviors emerge with the addition of modules with little or no change to other modules More fault tolerant, partial behaviors tend to persist even when many modules fail to act Deficiencies of Behavior-based robots: 'Higher order' reasoning and logic functions are too complex No capacity to learn from mistakes except through changes, addition, or subtraction of modules
  • 14. Part III. Making our Bioinformatics Pipelines Organic, and Adaptable
  • 15. Many Bioinformatics Pipelines resemble Shakey by: Involving centralized controller systems which control every aspect of pipeline behavior Mixing the logic for selecting tasks from a list together with the logic for performing these tasks
  • 16. Except that, unlike Shakey, many pipelines: Have little or no knowledge of their computing environment Have no, or very little, capacity to: perform tasks in different orders, opportunistically temporarily re-focus their work on smaller subsets of the total task list run tasks in parallel etc. Lack intelligent points for human agent inclusion Are subject to human will at every level
  • 17. Behavior Based Pipelines are ideal for dealing with heterogeneous data efficiently
  • 18. They are Modular Much like Object Oriented Programming Failure is easy to diagnose and fix Failure in one module does not (necessarily) impact other module actions Failure in one module does not (necessarily) require other modules to be rerun, or require complex skipping logic in the pipeline code
  • 19. They are Adaptable New analyses should simply require plugging in a new module, with minimal or no 'rewiring' of other modules Reanalyses should simply require the removal of certain outputs, and possibly a reset of the completion state of a particular task to accomplish, and all downstream tasks should either react to the presence of new data, or require minimal state manipulation to get them to rerun themselves Modules can be augmented, or replaced as needed, with little or no change to other modules, as long as their original functionality is maintained or assumed by another module
  • 20. They are Scalable Modules can be deployed onto as many different machines as are available (servers, nodes on a cluster, nodes in a cloud) to expand throughput Modules with high resource requirements can be deployed onto separate machines from those with low resource requirements Modules can be grouped together on different machines, or sets of machines, according to functionality, or data proximity
  • 21. They act Autonomously Individual modules can 'react' to data to produce information as soon as the data is made available in the 'environment' Datasets can be moved through the pipeline at different rates Modules do not require humans to manage them, but, instead, react and respond to different human inputs at many different places Humans are really just another intelligent agent in the system
  • 22. They can act Opportunistically Modules can be tied into multiple task-management systems overall dataset-task list priority dataset-task list machine specific dataset-task lists manual intervention The priority system can be set to take precedence over the overall system, but if priority datasets get backlogged, the system can still opportunistically process items in the overall system until the backlog is cleared, and the priority system can then regain the focus of some or all machines in the system
  • 23. They are sensitive to their computing environment, and knowledgeable of the resources they need to work Modules should know how much memory, file system space, etc. they need Modules should know about other modules that would compete with them for scarce resources This may run counter to the ethos of platform nutrality, but, for instance (if you are running on redhat/centos) you can parse /proc/meminfo for memory information (my $meminfo = YAML::LoadFile('/proc/meminfo')), ps for information on other processes running in the environment, df for filesystem information, etc.
  • 24. These systems have other advantages: They make it easy to get up and running with 1-2 modules tested on a small dataset, which can then be applied to all other datasets available, and yet to come They allow for 'partial solutions', e.g. some data will always be produced even if the entire pipeline is not finished (what pipeline is ever 'finished', anyway), or if one or more parts of the pipeline are discovered to have bugs New modules can be created, tested against 1 or more datasets, and then 'released to the wild' so that they can autonomously fill in the gaps for all previously received data, and then analyze all data received in the future Buggy modules can be pulled out of the pipeline, fixed and tested in the same way
  • 25. Part IV. The IGSP Encode Pipeline
  • 26. Pipeline designed to generate data for the Encyclopedia of DNA Elements (EncODE) http://www.genome.gov/10005107 For both EncODE and non EncODE cell_lines and treatments: Automates movement of data from sequencing staging to IGSP server Aligns raw sequence files to hg19 using bwa (previously hg18 using maq) Generates feature density distributions of whole-genome sequence data aligned to hg19 Generates visual tracks of data in the IGSP internal UCSC Genome Browser Generates submission tarballs of bam, peaks, parzen bigWigs, and base count bigWigs to be submitted to UCSC
  • 27. Compute Infrastructure 4 Centos Compute Nodes: 8 core (2.50 Ghz, dual quad core procs), 32GB 1066Mhz Ram, Primary 120GB HDD, Secondary 250GB HDD Duke Shared Cluster Resource: 19 high priority Encode nodes, each with 8 cores and 16 GB Ram Compute nodes connected to DSCR via NFS mounted volume provided by a Netapp NAS array of 42 15k 450GB FC disks exported through a 10G Fibre-E link Raw Data, and analytical output stored on two NFS mounted volumes provided by a Netapp NAS array of 14 7.2k sata disks, 1TB and 750G in size Each compute node contains its own, locally mounted 230G scratch directory to minimize NFS read-write concurrency issues
  • 28. Pipeline composed of many different agents, each falling into one of three categories: Runner Agents: These simply read through a list of datasets and tasks to be done on each dataset, and launch the necessary processing agents required to accomplish each task on the dataset. They do not care whether it is possible for the agent to accomplish the task on the dataset Processing Agents: These are small programs designed to perform a specific processing task on a given dataset. In addition, they are designed to know when it is possible to perform the task (based on prerequisites), whether the resources (memory, storage space, etc) required for it to run are available, and whether other programs which are running on the system will compete with it in ways which adversely effect its performance
  • 29. Main Task List Composed of a set of worksheets in a Google Spreadsheet. This has a number of advantages: Allows people all over the world to keep track of what has been done, and what remains to be done Since the Google Spreadsheet API is also available to agents on any internet connected computer, it can be used by runner and processing agents on any number of servers
  • 30. The third type of agents in this system are humans The google spreadsheet model makes it very easy to plug humans into the overall logic of the system: arguments, variables, and state switches can be communicated to an agent using meta-fields on the worksheet. The values for these fields can be filled in by humans, or other computer agents processing agents can be coded to require prerequisite meta-fields which require a human to switch on before they run processing agents can write data to information fields upon completion, failure, or both. This might include changing the state of prerequisite fields required by other agents processes requiring human intervention can be replaced by computational logic over time, as the logic becomes formalized into one or more agents
  • 31. Part V. Google::Spreadsheet::Agent http://search.cpan.org/~dmlond/Google-Spreadsheet-Agent-0.01
  • 32. #!/usr/bin/perl use strict; use Getopt::Std; use Google::Spreadsheet::Agent; # usually other modules are used my $goal = basename($0); $goal =~ s/_agent.pl//; my $cell_line = shift or die "cell_linen"; my $technology = shift or die "technologyn"; my $replicate = shift or die "replicaten"; my $google_page = ($replicate =~ m/.*_TP.*/) ? 'combined' : $technology; my %opts; getopts('dr:P:', %opts); my $debug = $opts{d}; $data_root = $opts{r} if ($opts{r}) $google_page = $opts{P} if ($opts{P}); my $prerequisites = []; $prerequisites->[0] = ($replicate =~ m/.*_TP.*/) ? 'combined' : 'aligned'; my $google_agent = Google::Spreadsheet::Agent->new( agent_name => $goal, page_name => $google_page, debug => $debug, max_selves => 3, bind_key_fields => { cellline => $cell_line, technology => $technology, replicate => $replicate }, prerequisites => $prerequisites ); $google_agent->run_my(&agent_code); exit;
  • 33. my $min_gigs = 18; # start with an 18G /scratch2 availability requirement my $gigs_avail = &get_scratch_availability or exit(1); exit if ($gigs_avail < $min_gigs); sub get_scratch_availability { my $opened = open (my $df_in, '-|', 'df', '-h', '/scratch2'); unless ($opened) { print STDERR "Couldnt check scratch2 usage $!n"; return; } my $in = <$df_in>; # skip first line $in = <$df_in>; chomp $in; close $df_in; my $gigs_avail = (split /s+/, $in)[3]; $gigs_avail =~ s/D+$//; return $gigs_avail; } use YAML::Any qw/LoadFile/; my $min_mem = 16; # requires about 16-18G memory to run exit if (&get_available_memory <= $min_mem); sub get_available_memory { my $info = LoadFile('/proc/meminfo') or die "Couldnt load meminfo $!n"; my $free_mem = $info->{MemFree}; $free_mem =~ s/D+$//; my $buffers = $info->{Buffers}; my $cached = $info->{Cached}; $buffers =~ s/D+$//; $cached =~ s/D+$//; $free_mem += $buffers + $cached; $free_mem /= (1024*1024); return $free_mem; }
  • 34. sub agent_code { my $entry = shift; my $replicate_root = join('/', $data_root, $cell_line, $technology, 'sequence_'.$replicate); my $db_name = getDBName($replicate_root); my $scratch_root = $replicate_root; $scratch_root =~ s/$data_root//scratch2/; my $helper_command = join(' ', join('/', $generic_apps_dir, 'parzen_fseq_helper.pl'), $replicate_root, join('/', $replicate_root, 'bwa_'.$entry->{build}, 'sequence.final.bed'), $cell_line, $technology, $entry->{sex}, $entry->{build}, $db_name ); print STDERR "Running ${helper_command}n"; `$helper_command`; if ($?) { print "Problem running parzen_helper $!"; return; } my $parzen_track_name = $db_name . "_parzen"; my $scratch_parzen_dir = join('/', $scratch_root, 'parzen_'.$build); my $parzen_dir = join('/', $replicate_roo 'parzen_'.$build); $parzen_dir =~ s/sata2/sata4/; my $wiggle_helper = join(' ', join('/', $generic_apps_dir, 'parzen_wiggle_helper.pl'), $build, $parzen_track_name, $parzen_dir, $scratch_parzen_dir ); print STDERR "Running ${wiggle_helper}n"; `$wiggle_helper`; if ($?) { print STDERR "Problem running wiggle_helper $!n"; return; } return 1; }
  • 35. #!/usr/bin/perl use FindBin; use Google::Spreadsheet::Agent; my $google_agent = Google::Spreadsheet::Agent->new( agent_name => 'agent_runner', page_name => 'all', bind_key_fields => { cellline => 'all', technology => 'all', replicate => 'all' } ); # iterate through each page on the database, get runnable rows, and run each runnable on the row foreach my $page_name ( map { $_->title } $google_agent->google_db->worksheets ) { foreach my $runnable_row ( grep { $_->content->{ready} && !$_->content->{complete} } $google_agent->google_db->worksheet({ title => $page_name })->rows ){ foreach my $goal (keys %{$runnable_row->content}) { next if ($runnable_row->content->{$goal}; # r,1,F cause it to skip # some of these will skip because they are fields without agents my $goal_agent = $FindBin::Bin.'/../agent_bin/'.$goal.'_agent.pl'; return unless (-x $goal_agent); my @cmd = ($goal_agent); foreach my $query_field ( sort { $google_agent->config->{key_fields}->{$a}->{rank} <=> $google_agent->config->{key_fields}->{$b}->{rank} } keys %{$google_agent->config->{key_fields}} ) { next unless ($row_content->{$query_field}); push @cmd, $row_content->{$query_field}; } system( join(' ', @cmd).'&'); sleep 5; } } } exit;
  • 36. Future Plans 1. Making inter-lab communication more concrete, automatic 2. Each server can have its own 'task' view of a particular google spreadsheet worksheet, in that it can have its own unique set of executable agent_bin scripts tied to a set of fields that systems on other servers would ignore 3. Put some of the runner code, and requirements checking routines into Google::Spreadsheet::Agent for version 1.1
  • 37. Acknowledgements The Institute for Genome Sciences and Policy (IGSP) The Encode Consortium Terry Furey Alan Boyle Greg Crawford Mark DeLong Rob Wagner Peyton Vaughn Darrin Mann Alan Cowles