Developer Data Modeling Mistakes: From Postgres to NoSQL
Synergy on the Blockchain! whitepaper
1. A blockchain based collaborative platform for developing
Data Science and Artificial Intelligence products
White Paper
September 15, 2017
Written By: Yousef Fadila
2. Executive Summary
● Synergy utilizes blockchain distributed trust technology to build a smart
contract powered research and development platform for developing
state-of-the-art machine learning and artificial intelligence products.
Collaborators and researchers are incentivized to improve other fellow’s
solutions in rounds and in exchange receive SNG tokens. Therefore,
driving innovation and continuous improvement.
● Synergy provides a fair and decentralized development tournament
platform that tracks and rewards all contributors to the final solution.
● Synergy organizes all developed products in a web based analytics and
modeling platform. A product can be available in two modes: (1) As a
model architecture with optimally tuned parameters to be trained from
scratch with user-specific data and (2) As a pre-built model which is
ready for deployment. The platform features building AI solutions as a
flow of drag and drop components to make them available for non-
technical users. As a result, they can quickly build state of the art data
science products and AI agents
● Synergy analytics and modeling platform features smooth deployment
of data science products and AI agents. Deployment can be done by a
simply dragging and dropping a deployment component to the flow. As
a result, the model becomes accessible through REST Queries.
● Subscriptions to Synergy analytics and modeling platform are fully
managed through smart contracts using SNG tokens. The smart
contract fairly distributes tokens to all contributors and therefore
provides perpetual rewards to all developers and contributors. This is
an extra incentive to our community of developers and data scientist to
continually participate in Synergy competitions and improve models.
3. Abstract
The value of data science and the power of machine learning is growing
exponentially. It is a matter of time for businesses to lose to its competitors if they
not utilizing their data efficiently. Data-mining and machine learning are advanced
sciences that can accurately predict customer behavior patterns, logistics and
distribution issues, future trends and more. Without data-mining and machine
learning technologies, a company is at a significant disadvantage. That is, it won’t
know what its competitors know about future trends and the current market.
Data scientist has been called the “The Sexiest Job of the 21st Century” 1]. There
is no doubt that almost every business needs to integrate data science in its
decision-making process. However, due to the veracity and variety of big data2 and
the need of it in different fields makes it very challenging to hire the right data
scientists. Furthermore, many companies are moving to the data based decision
making models and as a result there aren’t enough data scientists to meet the
increasing demand in the current market. The rapid innovation in the field makes
it even more challenging for companies to cope with state-of-the-art techniques
needed to evaluate their data-science models. Synergy solves that by providing a
rich data analytics and model-building platform, featuring human-in-demand
machine learning, and reusable off-the-shelf models. Synergy provides a fully
transparent and decentralized data science competition management platform
powered by public smart contracts on public blockchain. The platform is able to
track all contributors to the final solution in all rounds, encourage collaboration to
improve other solutions and to build state-of-the-art models that meet the sponsor
needs.
Table of Contents
1 Harvard business review https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
2 4-Vs-of-big-data http://www.ibmbigdatahub.com/infographic/four-vs-big-data
4. 1. Introduction 5
1.1. What is Data Science? 5
1.2. Why Data Science Matters? 6
1.3. Why Companies Need Data Science Tools? 6
1.4. How Synergy is Addressing the Market Needs? 7
2. Synergy Analytics and Modeling Platform Overview 9
2.1. Synergy Analytics and Modeling Platform 9
2.2. ZeroDriver artificial intelligence and machine learning modeling 10
2.2.1. Smooth Deployment of Data Science and AI Apps 11
2.3. Human-on-demand Machine Learning 12
2.3.1. Host a Competition in a Platform that Drives Innovation. 12
2.3.1.1. Submission and Reward 13
2.3.2. Perpetual Compensations 14
2.4. Summary of Platform Top Features: 16
3. Why Blockchain? 16
3.1. What Problem does a Blockchain Solve? 16
3.2. What are Smart Contracts? 17
3.3. How Synergy is utilizing Blockchain Technology? 17
4. High-level Roadmap 19
4.1. 2018-2019 Roadmap of Synergy Competition Host Platform 20
4.2. 2018-2019 Roadmap of Synergy Analytics and Modeling Platform 20
5. Token Info 22
5.1. Token Distribution 24
5. 1. Introduction
1.1. What is Data Science?
Data science is an interdisciplinary field
about scientific methods and processes
to extract insights from data. It employs
techniques and theories drawn from
many fields within the broad areas of
mathematics, statistics, information
science, and computer science, in
particular from the subdomains of
machine learning, classification, cluster
analysis, data mining, databases, and
visualization. Data scientists usually
have advanced degrees and training in statistics, mathematics, business,
computer science and information management.
Today, a lot more organizations are opening up their doors to big data and
unlocking its power. The advancement in big data technologies that brings low cost
storage and cheap computing power makes it feasible for businesses to collect big
data for either external entities such as competitors and market trends or internal
use such as company’s operations. However, data without the right people to
process it or without the right tools to extract insight from it is worth nothing, the
real business value lies in processing and analyzing data – and that is where, a
data scientist steps into the spotlight.
6. 1.2. Why Data Science Matters?
Data science can add significant
value to businesses by the addition
of statistics and insights across all
business processes. It helps in
making better decisions, through
measuring, tracking and modeling
performance metrics. Furthermore,
Data science supports forecasting
future trends, identifying
opportunities, discovering current
flaws and outliers, predicting future
behavioral patterns, providing personalized experience through recommendation
systems and identifying target audiences.
1.3. Why Companies Need Data Science Tools?
The number of devices sending data is growing
rapidly while the cost of storing data continues to
decline. Companies today collect tremendous
amounts of data. However, they still struggle in the
process to extract business values of their data.
There is no doubt that traditional tools are becoming
obsolete because they can’t handle the current scale
of data. In the current rapid growth market, there is a
need to continually improve current tools and develop
new ones to add automation and to maximize the benefits that data can bring to
companies. Luckily the rapid innovation in the field brought up many tools to fit the
new scale and allow quick modeling and prototyping. But that also means that new
talent and skills are required to efficiently run these tools and build better models.
7. Today, the role of a data scientist is one of the most in-demand positions. This
means that data scientists can be difficult to find and also expensive to hire and
retain. At Synergy we try to solve this by providing two things: (1) a platform to
outsource data science and artificial intelligence research and development -
human-on-demand machine learning (2) a Zero-code, model-building and model-
tuning(selecting optimal parameters) platform. This innovation replaces the skills
that data scientists bring in to develop machine learning models, such as feature
selection, model selection and model tuning for specific datasets.
1.4. How Synergy is Addressing the Market Needs?
In current market, there is a growing demand for data scientists and AI products
that is likely to continue in the foreseeable future. Almost no company could survive
without integrating data science products and AI. For some companies, off-the-
shelf products could be sufficient while for many others it may not. In addition,
many companies may not be able to hire enough engineers to build complete AI
products internally. Synergy addresses all of these needs by offering (1) off-the-
shelf products through Synergy analytical and modeling platform and (2) human-
on-demand machine learning for companies who don’t want to use off-the-shelf
products.
Synergy platform deploys human-on-demand developed products to the analytical
and modeling platform, therefore, allowing non-technical users and companies to
access state-of-the-art models that are developed from talented data scientist.
In addition to addressing the market need of organizations, Synergy also meets
the market need of developers, freelance data scientists and researchers to
monetize their work with assurance to track back the credit to them. Synergy is
able to provide a fair reward to all contributors in the final solution by tracking all
forks and improvements to the solution in all rounds. Not only that but Synergy
also guarantees perpetual reward for contributors after their models are deployed
in Synergy analytical and modeling platform. By using blockchain distributed trust
8. network and the power of smart contracts, Synergy is able to both (1) incentivize
researchers and developers without having trust concerns and (2) guarantees the
competition’s sponsor that its minimum acceptance quality is met before the
promised reward is distributed. This happens in fully transparent mode, thanks to
the blockchain technology.
9. 2. Synergy Analytics and Modeling Platform Overview
Value extraction from business data is a crucial mission for every company: data
mining and machine learning can accurately predict customer behavior patterns,
product-matching, logistics and inventory needs. However, the shortage of data
scientists in the market and the high demand that makes hiring an experienced
data scientist very expensive cause many businesses to lag behind. Forcing them
to use outdated analytics tools or simple machine learning models that simply
cannot maximize the value from business data.
We, at Synergy, believe that for many companies, especially small and medium
sized companies, better technology can be affordable. As companies prefer to
invest less for more, we are proposing a rich data analytics platform combined
with a ZeroDriver machine learning modeling platform which makes machine
learning accessible to non-expert users. We do this by auto-tuning the process
from data cleaning and features selection to model evaluation. The platform is
backed by a community of data scientists who are incentivized to build reusable
off-the-shelf models in a collaborative manner, improving each others’ solution.
To summarize Synergy Platform consists of:
1) Component-Rich data analytics platform combined with ZeroDriver artificial
intelligence and machine learning modeling
2) Fully transparent and decentralize human-on-demand machine learning
development platform with rewards, powered by smart contracts on a public
blockchain.
2.1. Synergy Analytics and Modeling Platform
Synergy aims to build a component-Rich data analytics web platform backed with
community of highly talented data scientists and data engineers who are
incentivized to continuously improve these components. Synergy aims to bring
predictive intelligence to judgments made by decision makers by offering a wide
range of techniques and algorithms. All without writing a single line of code.
10. Synergy relies on several open-source platforms to act as infrastructure for the
Analytics and Modeling Platform. The modeling platform PoC (proof of concept)
was built on Apache Zeppelin but we still consider other candidate for the final
solution. Apache Zeppelin is an open-source award winning web-based notebook
that enables data-driven and interactive data analytics in the browser. The
following is a screenshot of advanced analytics page of Synergy’s PoC for
illustration purpose only.
A screenshots of Synergy’s proof of concept - advanced analytics page.
2.2. ZeroDriver artificial intelligence and machine learning modeling
In the ZeroDriver artificial intelligence approach, the user defines a flow by building
components that start with data connection and ends with the model evaluation
component. In addition, the user defines a measurement factor (evaluation metric)
such as Accuracy, Precision, F1 measure or any other custom measure that could
be calculated in the model evaluation component. Synergy platform will tune the
model parameters to maximize the measurement value. (optimally tuning)
11. A screenshot of Synergy’s proof of concept- build a flow page. The right side shows samples of
configurable components that can be used to build a flow.
2.2.1. Smooth Deployment of Data Science and AI Apps
Synergy’s artificial intelligence and machine learning modeling platform enables a
smooth deployment of pre-built or user developed models as web apps. Users can
easily initiate the deployment process by adding an API access component to the
flow, and then connect it to a responding model. Synergy allows full customization
Data
Pre-
Processin Models
Evaluati
on
User would be able to drag and drop components
from Synergy catalog. When auto-tuning is on, the
Pre-
Processin
Output
Selected
Model that
12. of the flow by allowing users to add input filtering components or output
customizing components at any stage in the flow.
The platform enables many advanced features, among them, are the bagging and
aggregation features. The aggregation feature allows users to build AI apps that
are composed of multiple models rather than one model. This is achieved by
enabling the user to build a flow that forwards the API queries to multiple models.
For example, a user wants to deploy an image classification model to distinguish
cats from dogs. Let’s assume that there is a pre-built model in Synergy's catalog
for such task and the user has created an additional two models using different
architectures for the same purpose. In many cases, there is no one model that is
always more accurate than others in all inputs. In such case, a user can consider
a multiple models approach to build a more accurate app. Therefore, instead of
building an app that consists of one model, the user can decide to deploy all three
models to respond to API queries with a policy, such as a majority vote, to agree
on the final answer. Synergy enables building such scenario by using drag-and-
drop components without writing a single line of code.
2.3. Human-on-demand Machine Learning
2.3.1. Host a Competition in a Platform that Drives Innovation.
Building state-of-the-art machine learning models and data science products can’t
be done without collaboration between mathematicians, scientists, and domain-
specific researchers. Synergy incentivizes collaboration on a decentralized
iteration process to build better data science and artificial intelligence models.
Synergy is composed of a research and development environment with a reward
system and evaluation platform where developed models could be evaluated on
new datasets. Synergy features multi-stages, multi-round, fully transparent, smart
contract powered competitions that encourage collaborators to build and expand
upon each other’s work. Synergy platform tracks all contributions in all stages to
the final solution.
13. A screenshot of Synergy’s PoC - host a competition page. The proof of concept features a user
friendly interface to build and deploy contracts on the blockchain
2.3.1.1. Submission and Reward
Let’s look at this example to illustrate how it works:
⇒ A new competition is published to develop artificial intelligence based trading
algorithm for cryptocurrencies.
Rewards Rules - (all rules are set in a smart contract that escrows all tokens):
1) 1000 SNG tokens to the final solution.
2) 1000 SNG tokens for top two solutions in each round.
3) 5000 SNG tokens for all contributors in the chain lead to the final solution.
4) Number of rounds: three
5) Evaluation metric: Return on investment in a cryptocurrency data.
6) Minimum accepted quality: the minimum value of the assigned metric
required by the final solution in order to accept it and distribute rewards.
In round 1, submitter of A and B will
share 1000 tokens reward which will be
In round 2, all solutions should be forked
from round 1 solutions with
enhancement. Submitter of E and F
solutions will share the 2nd round reward
of 1000 tokens.
In round 3 (final), Submitter of solution
K will be rewarded with 1000 tokens.
5000 tokens for all contributors in the
chain lead to the final solution.B, F and K
will share these 5000.
14. ⇒ Tokens are distributed from the smart contract of the competition to the
submitters’ SNG wallets.
2.3.2. Perpetual Compensations
At the end of each competition, if the solution can be reused for different problems,
Synergy or a contributor will wrap the solution with Synergy catalog API and add it
for later usage by Synergy analytics and modeling platform. In addition, a smart
contract is created to define all contributors to the final solution for future rewards.
The smart contract will allow receipt of tokens and automatically distribute them to
all contributors according to the predefined rules. The contributors will gain
perpetual tokens based on their model’s usage rate.
To illustrate that, let’s continue with previous example, assume the winner, solution
K, has the potential to be applied for different commodity trading. Let’s assume
that the rules of perpetual compensations are:
1. 60% equally to the chain lead to the final solution(B and F in the example).
2. 30% for the submitter of the final solution.
3. 10% for the contributor who worked on the API wrapping.
The deployed smart contract has an attached ERC20 payment address that
automatically distributes 60% of the recipient SNG tokens to the submitter of B and
F, 30% to the submitter of K, and 10% to the submitter of the solution to the catalog
(or to Synergy company in case the wrapping and submitting was done by an
internal employee). To learn more how it works, see section 3.3.How Synergy is
utilizing Blockchain Technology?
15. 2.4. Summary of Platform Top Features:
Synergy Analytics and Modeling
Platform
● Automated multiple modelling
● Support multiple data sources
● Multiple visualization method
● Multiple analytics methods
● Rich data pre-processing library
● Build a flow throw drag and
drop components.
● Code-less, optimally tuning
mode
● Pre-build statistical models
● Pre-build classical machine
learning models.
● Acquire state-of-the-art models
from Synergy's competitions
submitted models.
Synergy Competition Host Platform
● Human-on-demand: Host a
competition or challenge in a
simple click
● Competition host can set a
minimum acceptance quality
criteria that the proposed
solutions have to meet in order
to be rewarded.
● Multi-round, multi-stage
competitions to maximize
proposed model performance.
● Bases on distributed trust
blockchain technology. Foster
data scientists to participate
without trust concerns.
● The winning models can
generate perpetual
compensations for contributors.
3. Why Blockchain?
3.1. What Problem does a Blockchain Solve?
Blockchain technology allows exchange of digital assets without an intermediary
trust entity. This technology enables a distributed trust network with no single
trusted arbiter to verify trust and the transfer of value. It transfers power and control
from one entity to the many entities, enabling safe, fast and cheaper transactions
despite the fact that we may not know the entities we are dealing with. A blockchain
lets us agree on the state of the system, even if we all don’t trust each other.
16. 3.2. What are Smart Contracts?
Smart contracts are self-executing pieces of logic that run on the blockchain. The
smart contract contains logic that two parties agree on that can’t be altered by any
of the parties. Smart contracts can also act as payment recipient with a defined
logic about when and how to distribute received assets.
3.3. How Synergy is utilizing Blockchain Technology?
Synergy platform consists of two major services:
(1) Human-on-Demand machine learning development platform with rewards.
(2) Component-Rich data analytics and modeling platform.
First, Synergy leverages blockchain to provide a fully transparent incentivized
research and development platform. This enables trustworthy transactions in a
trustless world. Collaborators and developers can expand on each others solutions
without fear of losing credit or manipulation. Collaborators do not need to trust the
sponsor of the challenge as the SNG tokens are escrowed into the smart contract.
The sponsor of the challenge do not need to trust the developers as the contract
will not execute the payment if the final solution does not meet the minimum
accepted quality which is the minimum value of the assigned metric required by
the final solution in order to accept it and distribute rewards.
Secondly, Synergy utilizes blockchain technology to provide a fully automatic and
transparent way to give perpetual rewards for the developers of the off-the-shelf
models and analytical components. Subscriptions to the data analytics and
modeling platform are paid to a global smart contract address that splits all
received SNG tokens between the company and model’s developers. This is done
by having a hierarchy of smart contracts with ERC20 addresses attached to them
that distributes all SNG tokens sent to its address to all contributors to off-the-shelf
models based on model usage ratios. The following figure shows how Synergy is
managing to provide perpetual rewards to all developers using smart contract on
public blockchain.
17. The figure illustrates Synergy’s smart contracts hierarchy that manage the subscription to
Synergy Analytics and Modeling Platform. All contracts are deployed on public blockchain
and can be audited by the public.
18. 4. High-level Roadmap
Synergy has started in Q2 2017 by a group of Fulbright scholars and data
engineers at WPI and Brown university. The team built the first proof of concept
during Q3 2017. Afterwards, the idea has attracted a lot of interest.ater, during Q4
2017, Synergy’s core and advisory teams were expanded with highly talented
technologists and business leaders who have accumulated 100+ years of
experience in engineering, technology management, entrepreneurship and
marketing.
During Q4 , 2017 we surveyed a group of R&D managers in our network. The
survey shows an increasing demand for a decentralized competition host platform
rather than the Synergy Analytics and Modeling Platform in businesses with
mature AI integration. One of the respondents explained that he will be willing to
outsource parts of his development or model improvement efforts to Synergy
community even if it is in beta release as it does not involve internal bureaucracy
due to its risk free nature, thanks to “pay only for a quality solution” feature.
Conversely, switching to a different cloud based analytics platform is a complex
process inside the organization that involves training efforts, risk assessment and
top management approval.
The result of the survey made us adopt an agile go-to-market strategy which allows
us to provide a quick answer to the growing market needs. We decided to deploy
the platform in two phases. In the first phase, we plan to launch Synergy
competition host platform only. That means Synergy Analytics and Modeling
Platform would have only one function; Human-on-demand machine learning
which allows to host a competition or a development challenge. This helps Synergy
in having quick access to the market and to start building a community based on
data scientists and data engineers.
We believe that being backed by strong diverse technical community will also
boost the development of Synergy Analytics and Modeling Platform. We can
19. outsource part of the development to our community who will be willing to
participate in the development in exchange with perpetual rewards once their
models are deployed. Furthermore, the risk-free nature of Synergy competition
host platform is likely to motivate many R&D managers to outsource part of their
model’s to our community. This will allow us to increase our initial offer of off-the-
shelf models when launching Synergy Analytics and Modeling Platform in the
second phase.
4.1. 2018-2019 Roadmap of Synergy Competition Host Platform
The Synergy team has scheduled the beta release of essential smart contracts
that powers the Synergy Competition Host Platform to the end of Q3 2018, the
smart contracts will be developed using Solidity[3] over Ethereum testnet.
During Q1 2019, Synergy will release a Graphical User Interface (GUI) to provide
seamless interaction with Ethereum public blockchain.
During Q2 2019, after completing testing and fixing on testnet, Synergy will deploy
the contracts to Ethereum mainnet and the GUI to a separate domain.
4.2. 2018-2019 Roadmap of Synergy Analytics and Modeling Platform
With the focus on Synergy Competition Host Platform during phase one, there will
be no releases of Synergy Analytics and Modeling Platform during 2018. Synergy
team is currently working with industrial partners to integrate their requirements in
the project roadmap. The alpha release of Synergy Analytics and Modeling
Platform is scheduled to Q3 2019, the release will feature data connectors
components that support various proprietary and open-source databases, pre-
processing components and classical machine learning models. The following
release will feature big data analytics and resource allocation for the algorithms.
Storage and computational power will be performed either by cloud computing,
such as AWS, or by exchanging SNG token to other ERC20 tokens to buy
resources (For example, filecoin.io that provides decentralized storage network).
3 http://solidity.readthedocs.io/en/develop/index.html
20. Synergy adopts the agile methodology for all internal development. Feedback from
our industrial partners will be integrated in all scrums, which means that the
roadmap could be shifted to meet the future market needs and future trends.
21. 5. Token Info
Token type: ERC-20 | Ticker: SNG | Total supply: 1 Billion
The SNG token is an ERC-20 token that acts as a utility token. Its main purpose
is to interact with Synergy’s distributed network.
The SNG token serves the following three different segments of interactors with
Synergy network:
1) Parties who want to sponsor data science, machine learning or artificial
intelligence challenges.
2) Organizations who want to utilize the service of Synergy Advanced
Analytics and Modeling platform.
3) Data-scientist, developers, and teams who want to submit their solution to
the platform.
○ Submitting, in general, doesn’t consume tokens, but in order to
protect the competition from massive vain submission attacks,
sponsors may require that only accounts with tokens in their wallets
be able to participate in the competition. In some cases, to protect
from multiple submissions from the same user, sponsors may also
require participants to deposit a certain amount of tokens that are
sent to participants after the competition due date.
The SNG token is designed to achieve two main objectives.
(1) To guarantee a non-discriminatory access to Synergy services. For
instance, the tokens enable unrestricted-access to consume, evaluate and
integrate state-of-the-art AI solutions.
(2) To enable building and growing a community of talented data scientists and
data engineers to develop innovative solutions.
22. The economic logic of SNG token is designed to support the continuous demand
of the token from the three segments of interactors with Synergy network. After the
initial token generation event, Synergy will not make official position to the value
of SNG token nor will be formally involved in secondary trading or valuation of SNG
tokens. Future value of the token is set to the market in full independence from us.