Talk at RubyKaigi 2015.
Plugin architecture is known as a technique that brings extensibility to a program. Ruby has good language features for plugins. RubyGems.org is an excellent platform for plugin distribution. However, creating plugin architecture is not as easy as writing code without it: plugin loader, packaging, loosely-coupled API, and performance. Loading two versions of a gem is a unsolved challenge that is solved in Java on the other hand.
I have designed some open-source software such as Fluentd and Embulk. They provide most of functions by plugins. I will talk about their plugin-based architecture.
11. Benefits of Plugin Architecture
> Plugins bring many features
> Plugins keep core software simple
> Plugins are easy to test
> Plugins builds active developer community
12. Benefits of Plugin Architecture
> Plugins bring many features
> Plugins keep core software simple
> Plugins are easy to test
> Plugins builds active developer community
> “…if it’s designed well”.
17. Plugin Architecture Design Patterns
a) Traditional Extensible Software Architecture
b) Plugin-based Software Architecture
18. Traditional Extensible Software Architecture
Host
Application
Plugin
Plugin
Register plugins
to extension points
To add more extensibility,
add more extension points.
20. Plugin-based software architecture
• Application as a network of plugins.
> Plugins: provide features.
> Core: framework to implement plugins.
• More flexibility != More complexity.
• Application must be designed as modularized.
> It’s hard to design :(
> Optimizing performance is difficult :(
• Loosely-coupled API often makes performance
worse.
21. Design Pattern 1: Dependency Injection
Core
class
interface
class interface interface
class class A component is
an interface or
a class.
Each component publishes API:
22. Design Pattern 1: Dependency Injection
Core
class
Plugin
Plugin Plugin Plugin
class Plugin
When application runs:
A DI container
replaces objects
with plugins when
application runs
23. Replace classes
with mocks for
unit tests
Design Pattern 1: Dependency Injection
Core
dummy
dummy
dummy dummy dummy
Plugin dummy
Testing the application
24. Dependency Injection (Java)
public interface Store
{
void store(String data);
}
public class Module
{
@Inject
Module(Store store)
{
store.store();
}
}
public class DummyStore
implements Store
{
void store(String data) { }
}
public class MainModule
implements Module
{
public void configure(
Binder binder)
{
binder.bind(Store.class)
.to(DummyStore.class);
}
}
interface → implementation
mapping
From source code,
implementation is black box.
It’s replaced at runtime.
26. Dependency Injection (Ruby)
class Module
def initialize(store:
DummyStore.new)
store.store(”data”)
end
end
class DummyStore
def store(data)
end
end
injector = Injector.new.
bind(store: DBStore)
object = injector.get(Module)
class DBStore
def initialize(db: DBM.new)
@db = db
end
def store(data)
@db.insert(data)
end
end
injector = Injector.new.
bind(store: DBStore).
bind(db: SqliteDBImpl)
object = injector.get(Module)
I want to do this: Keyword arguments
{:keyword => class} mapping
at runtime
29. Design Pattern 3: Combination
Core
class
Plugin
class Plugin Plugin
class class
Plugin
Loader
Plugin
Plugin Plugin
Plugin Plugin
Dependency Injection + Plugin Loader
30. Plugin Architecture Design Patterns
a) Traditional Extensible Software Architecture
b) Plugin-based Software Architecture
> Dependency Injection (DI)
> Dynamic Plugin Loader
> Combination of those
There’re trade-offs
> Choose the best solution for each project
32. What’s Fluentd?
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Written in C & Ruby
> Plugin Marketplace on RubyGems
> http://www.fluentd.org/plugins
> Working in production
> http://www.fluentd.org/testimonials
37. # logs from a file
<source>
type tail
path /var/log/httpd.log
pos_file /tmp/pos_file
format apache2
tag web.access
</source>
# logs from client libraries
<source>
type forward
port 24224
</source>
# store logs to ES and HDFS
<match web.*>
type copy
<store>
type elasticsearch
logstash_format true
</store>
<store>
type s3
bucket s3-event-archive
</store>
</match>
<match metrics.*>
type nagios
host watch-server
</match>
44. Fluentd’s Plugin Architecture
• Fluentd is a plugin-based event collector.
> Fluentd core: takes care of message routing
between plugins.
> Plugins: do all other things!
• 300+ plugins released on RubyGems.org
• Fluentd loads plugins using Gem API.
49. Use case 1: Sync MySQL to Elasticsearch
embulk-input-mysql
embulk-filter-kuromoji
embulk-output-elasticsearch
MySQL
kuromoji
Elasticsearch
50. Use case 2: Load from S3 to Analytics
embulk-parser-csv
embulk-decoder-gzip
embulk-input-s3
csv.gz
on S3
Treasure Data
BigQuery
Redshift
+
+
embulk-output-td
embulk-output-bigquery
embulk-output-redshift
embulk-executor-mapreduce
51. Use case 3: Embulk as a Service at Treasure Data
52. Use case 3: Embulk as a Service at Treasure Data
REST API to load/export data
to/from Treasure Data
62. Embulk
• Embulk is a plugin-based parallel bulk data loader.
• Guess plugins suggest you what plugins are
necessary, and how to configure the plugins.
• Executor plugins run plugins in parallel.
• Embulk core takes care of message passing
between plugins.
• Embulk loads plugins using JRuby and Gem API.
64. Header of embulk.jar
: <<BAT
@echo off
setlocal
set this=%~f0
set java_args=
rem ...
java %java_args% -jar %this% %args%
exit /b %ERRORLEVEL%
BAT
# ...
exec java $java_args -jar "$0" "$@"
exit 127
PK...
65. embulk.jar is a shell script
: <<BAT
@echo off
setlocal
set this=%~f0
set java_args=
rem ...
java %java_args% -jar %this% %args%
exit /b %ERRORLEVEL%
BAT
# ...
exec java $java_args -jar "$0" "$@"
exit 127
PK...
argument of “:” command (heredoc).
“:” is a command that does nothing.
#!/bin/sh is optional.
Empty first line means a shell script.
java -jar $0
shell script exits here
(following data is ignored)
66. embulk.jar is a bat file
: <<BAT
@echo off
setlocal
set this=%~f0
set java_args=
rem ...
java %java_args% -jar %this% %args%
exit /b %ERRORLEVEL%
BAT
# ...
exec java $java_args -jar "$0" "$@"
exit 127
PK...
.bat exits here
(following lines are ignored)
“:” means a comment-line
67. embulk.jar is a jar file
: <<BAT
@echo off
setlocal
set this=%~f0
set java_args=
rem ...
java %java_args% -jar %this% %args%
exit /b %ERRORLEVEL%
BAT
# ...
exec java $java_args -jar "$0" "$@"
exit 127
PK...
jar (zip) format ignores headers
(file entries are in footer)
69. Pitfalls & Challenges
• Plugin version conflicts
• Performance impact due to loosely-coupled API
70. Plugin Version Conflicts
Embulk Core
Java Runtime
aws-sdk.jar v1.9
embulk-input-s3.jar
Version conflicts!
aws-sdk.jar v1.10
embulk-output-redshift.jar
71. Multiple Classloaders in JVM
Embulk Core
Java Runtime
aws-sdk.jar v1.9
embulk-input-s3.jar
Isolated
environments
aws-sdk.jar v1.10
embulk-output-redshift.jar
Class Loader 1
Class Loader 2
72. Version conflicts in a JRuby Runtime
Embulk Core
Java Runtime
httpclient 2.5.0
embulk-input-sfdc.gem
Version conflicts!
httpclient v2.6.0
embulk-input-marketo.gem
JRuby Runtime
73. Java Runtime
Multiple JRuby Runtime?
Fluentd Core
activerecord ~> 3.4
fluentd-plugin-sql.gem
Isolated
environments?
activerecord ~> 4.2
fluent-plugin-presto.gem ?
Sub VM 1?
Sub VM 2?
74. Version conflicts in Fluentd
Fluentd Core
CRuby Runtime
activerecord ~> 3.4
fluentd-plugin-sql.gem
Version conflicts!
activerecord ~> 4.2
fluent-plugin-presto.gem ?
75. Challenges
• Version conflict is not completely solved.
• Java can use multiple ClassLoader
• I haven’t figured out hot to do the same thing in
Ruby
• I don’t have clear ideas to solve performance impact
• Write more code to learn?
77. “How did I build Plugin Architecture?”
• I built Fluentd using dynamic plugin loader.
• “Plugin calls Plugins”
• Most of features are provided by the ecosystem of plugins.
• I built Embulk using combination of:
• Dependency Injection,
• JRuby to implement a Dynamic Plugin Loader,
• Java VM and nested ClassLoaders to load multiple versions
of plugins.
• But some problems are not solved yet:
• Version conflicts in a Ruby VM.
• Design patterns of plugins AND high performance.
78. What’s Next?
• You build plugin-based software architecture!
• And you’ll talk to me how you did :-)
• I’m working on another project: a distributed
workflow engine
• Java VM + Python
Thank You!
Sadayuki Furuhashi
Founder & Software Architect