TJBot is a DIY open source kit that allows you to build your own programmable cardboard robot powered by Watson. It consists of a cardboard cutout (which can be 3D printed or laser cut), Raspberry Pi and a variety of add-ons – including a RGB LED light, a microphone, a servo motor, and a camera. This presentation provides an overview of how Watson Cognitive Services are leveraged to create capabilities within TJBot, and how to build simple applications for TJBot using Node.js.
Embedding Intelligence in Everyday Objects with TJBot
1. IBM Research
1
IBM Research
Victor Dibia
Embedding Intelligence in
Everyday Objects with TJBot.
An open source DIY project powered
by Watson Cognitive Services.
Human – Agent Collaboration Lab, IBM Research
dibiavc@us.ibm.com
@vykthur | github.com/victordibia
Feb 20, 2017
2. IBM Research
2
TJBot : What
and Why?
- Open source DIY project to get you engaged
with Watson Services
3. IBM Research
3
What is TJBot?
- A cardboard robot
- Simple, approachable
- Open Source (design, code)
- Cognitive (IBM Watson services)
- Extensible (prototyping platform)
Components: Raspberry Pi, LED, Camera,
Microphone, Speaker, Servo.
6. IBM Research
6
Recipes.
Step by step instructions +
Code (node.js) to help you
prototype capabilities for TJBot
powered by Watson services.
http://www.instructables.com/member/TJBot/
7. IBM Research
7
Project Goals
How can we make it easier to engage a
community of enthusiasts experimenting
with embodied cognition – the idea of
embedding intelligence in everyday
objects within the physical world?
8. IBM Research
8
Project Goals
Design principle – Approachable Design
- Use of familiar material (cardboard) that can be
altered with ease.
- Simplified part assembly: no soldering or
adhesive required.
- Simplified programming model and language
interface (JavaScript).
9. IBM Research
9
Project Outcome
A prototyping platform to help
democratize Embodied Cognition.
Target communities:
- Makers
- Developers
- Students (Education and Learning)
10. IBM Research
10
How Does Watson
Enable TJBot?
Listen
Watson Speech to Text
service converts spoken
speech to text that can be
analyzed
Speak
Watson Text to Speech
service service converts
text to sound using various
voices.
Understand
Emotions
Watson Tone Analyzer service can
infer the emotion within text. E.g.. it can tell if
a message contains emotions like happy , sad,
angry
Understand
Conversations
Watson Conversation Service can
respond to users in a way that simulates a
conversation between humans.
See
Watson Visual
Recognition service can
understand the content of an
image and describe it.
11. IBM Research
11TJBot
Sensors
Example
Capabilities
Example
Watson Services
Example
Use cases
LED
Speakers
CameraServo Motor Arm
Microphone Listen
Speak
Shine
Show emotion
Wave
See
Speech to text
Tone Analyzer
Vision Recognition
Conversation
Text to speech
Sentiment
Analysis
Virtual Agents
(eldercare,
home care)
Education
(language
learning)
14. IBM Research
14
IBM Watson Cognitive
Take your first step into the cognitive era with our variety of smart services.
Services.
- Natural interaction
- Semi-structured data processing
- Trained and continuously improved via machine learning and deep
learning.
- Restful API services with SDKs for node.js, java, python.
18. IBM Research
18
Speech to Text
Converts audio voice into written text.
• Transcription
• Voice-controlled applications: allows for custom
models
https://speech-to-text-demo.mybluemix.net/
19. IBM Research
19
Text To Speech
Converts written text into natural sounding
audio in a variety of languages and voices.
• Customize and control the pronunciation of specific words to
deliver a seamless voice interaction that catered s to your
audience.
• Interactive voice based applications.
https://text-to-speech-demo.mybluemix.net/
20. IBM Research
20
Tone Analyzer
Uses linguistic analysis to detect three types
of tones in written text: emotions, social
tendencies, and writing style.
• Understand emotional context in conversations or
communications
• Taylor interaction based on sentiment.
https://tone-analyzer-demo.mybluemix.net/
21. IBM Research
21
Visual Recognition
Understands the contents of images - visual
concepts tag the image, find human faces,
approximate age and gender, and find similar
images in a collection.
• Train the service by creating your own custom concepts.
Use Visual Recognition to detect a dress type in retail,
identify spoiled fruit in inventory, and more.
https://visual-recognition-demo.mybluemix.net/
22. IBM Research
22
AlchemyLanguage
Analyzes text to help you understand its
concepts, entities, keywords, sentiment,
and more.
• Additionally, you can create a custom model for some APIs
to get specific results that are tailored to your domain.
https://alchemy-language-demo.mybluemix.net/
23. IBM Research
23
Conversation
Quickly build, test and deploy a bot or virtual
agent across mobile devices, messaging
platforms like Slack or even on a physical robot.
• Visual dialog builder to help you create natural conversations
between your apps and users, without any coding experience
required.
https://conversation-demo.mybluemix.net/
27. IBM Research
27
Libraries Used
Depends on several npm packages.
- RGB LED – ws281x library
- Servo – pigpio software PWM library
- Microphone – mic library
- Speaker – aplay library
- Camera – raspistill wrapper
28. IBM Research
28
Code Walk through: Control
LED on TJBot using voice.
- http://www.instructables.com/i
d/Use-Your-Voice-to-Control-a-
Light-With-Watson/
- Code Walk through
29. IBM Research
29
TJBot Library [Beta]
- Experimental work to encapsulate basic
functions of the bot.
- https://github.com/ibmtjbot/tjbotlib
30. IBM Research
30
The TJBot Library
Encapsulate basic functions
for TJBot such as listening,
speaking, led color change,
waving, seeing.
31. IBM Research
31
The TJBot Library
tj.listen(transcript callback)
tj.speak(“text”)
tj.converse()
tj.see()
tj.shine(“red”)
32. IBM Research
32
Code Walk through: Control
LED using the TJBot library.
- https://github.com/ibmt
jbot/recipes
- Code Walk through
34. IBM Research
34
Improving Accuracy
How do we improve interaction (voice)
accuracy? Improving Speech-to-Text
models may not be enough!
- Customized language models?
- Intent Matching?
- Multi-turn conversations?
35. IBM Research
35
Bot “Interruptibility”
When and how should the robot be
interrupted (while performing an activity
like speaking, waving etc.)?
- Vision? (monitoring a user’s facial expression,
raised hand)
- Hardware button or sensor?
36. IBM Research
36
Latency Tolerance
Latency can severely degrade quality
of interaction. How do we minimize its
effect?
- Managing and ordering service responses
- Leverage cues to provide additional information
- Balancing capabilities – cloud vs local
processing.
38. IBM Research
38
Next Steps
3 pronged
- Conduct basic research that address open
issues.
- Make TJBot simpler and easier to use (tjbot
library, visual programming tool)
- Build and sustain the TJBot community.
Presentation Overview
Hi all, thanks for coming to this session, and thanks to the organisers for letting us share this project with this awesome audience.I am new to this meetup group, and I look forward to attending more events in the near future! By show of hands, just to know the audience – how many of us
I am a researcher with the Human Agent collaboration group over at IBM research, Yorktown and we are a group of individuals this background in software engineering, HCI,behavioural psychology, cognitive sciences, robotics and mathematics. Pretty diverse group. Some of the projects we have worked on include cognitive M&A, designing interfaces for wearables and wearable apps, learning and navigation systems for robots etc.
My background has elements of software engineering, HCI and behavioural psychology.
=====
Today, I am going to talk about a relatively new project some members of our group have worked on TJBot ….This effort is co-led by my colleague Maryam Ashoori and our current work focus is related to designing novel interfaces for cognitive application development.
What are the issues that arise ?
What are viable use cases ?
What are applicable design patterns
I’ll start with an
overview of the TJBot project, why we are working on it.
I’ll also go over Watson Services which is a key aspect of this project .
Finally, I ‘ll go over technilcal details for programming TJBot and discuss some open issues we hope to learn more about as the project proceeds.
Take away …
Additional infromation on Watson services
How Watson can be embedded in objects to enable capabilities
How you can start prototyping embodied cognition apps with TJBot
Im not a developer advocate … so there might be service speciifc questions I cannot answer at the moment, but if you leave your context or tweet at me, I’ll ensure I follow up.
After we came up with the initial design,
Simplicity
Hardware
LED and hardware connection
Snap to build approach, no tools, glue or adhessive required.
Software
Javascript, the most widely used programming language
Nodejs and watson developer
Simplicity
Hardware
LED and hardware connection
Snap to build approach, no tools, glue or adhessive required.
Software
Javascript, the most widely used programming language
Nodejs and watson developer
At our research group we care about questions related to understanding the various ways in which we as humans can can work with machine in a symbiotic computing environment. We explore how, intelligence can be embedded into spaces (rooms), objects, avatars and icons etc. A subdomain on these area has to do with the design of cognitive objects and embodied cognition. It is nice to develop web based or mobile phone based cognitive apps (e.g. chatbots)Embodied Cognition
Embedding intelligence in every day objects. Understanding how physical attributes integrate with intelligent services to craft engaging user experiences.
Simplicity
Hardware
LED and hardware connection
Snap to build approach, no tools, glue or adhessive required.
Software
Javascript, the most widely used programming language
Nodejs and watson developer
Embodied cognition is a research program comprising an array of methods from diverse theoretical fields (e.g., philosophy, neuroscience, psychology, etc.) held together by the key assumption that the body functions as a constituent of the mind rather than a passive perceiver and actor serving the mind.
Helping bridge gaps for two groups of people .. Makers and developers and also working as a tool for curriculum development around cognitive application development
Embodied Cognition
When we simplify the process sufficiently, what would people create and
The library is meant to encapsulate basic capabilities of the robot
Avoid code repetition
Streamline programming paradigm and patterns
Free developers from the nitty gritty of hardware controls so they can focus on being creative about usecases
Embodied Cognition
Natural interaction with humans
For example speech and emotions
Not only used by web applications, but also in physical world
Processing of semi-structured data and big amounts of data
For example language classifications and image recognitions
Trained and continuously improved via machine and deep learning
For example search (retrieve and rank)
Leverages context to improve service qualities
Embodied Cognition
Embodied Cognition
TranscriptionUse speech to text to create voice-controlled applications – even customize the model to improve accuracy for the language and content you care about most such as product names, sensitive subjects, or names of individuals.
converts written text into natural sounding audio in a variety of languages and voices. You can customize and control the pronunciation of specific words to deliver a seamless voice interaction that catered s to your audience. Use text to speech to develop interactive toys for children, automate call center interactions, and communicate directions hands-free. https://text-to-speech-demo.mybluemix.net/
Tone Analyzer uses linguistic analysis to detect three types of tones in written text: emotions, social tendencies, and writing style. Use the Tone Analyzer service to understand emotional context of conversations and communications. Use this insight to respond in an appropriate manner.
https://tone-analyzer-demo.mybluemix.net/
Visual Recognition understands the contents of images - visual concepts tag the image, find human faces, approximate age and gender, and find similar images in a collection. You can also train the service by creating your own custom concepts. Use Visual Recognition to detect a dress type in retail, identify spoiled fruit in inventory, and more.
analyze text to help you understand its concepts, entities, keywords, sentiment, and more. Additionally, you can create a custom model for some APIs to get specific results that are tailored to your domain. https://alchemy-language-demo.mybluemix.net/
The library is meant to encapsulate basic capabilities of the robot
Avoid code repetition
Streamline programming paradigm and patterns
Free developers from the nitty gritty of hardware controls so they can focus on being creative about usecases
raspberry pi node.js libra
Simplicity
Hardware
LED and hardware connection
Snap to build approach, no tools, glue or adhessive required.
Software
Javascript, the most widely used programming language
Nodejs and watson developer
raspberry pi node.js libra
The library is meant to encapsulate basic capabilities of the robot
Avoid code repetition
Streamline programming paradigm and patterns
Free developers from the nitty gritty of hardware controls so they can focus on being creative about usecases
The library is meant to encapsulate basic capabilities of the robot
Avoid code repetition
Streamline programming paradigm and patterns
Free developers from the nitty gritty of hardware controls so they can focus on being creative about usecases
The library is meant to encapsulate basic capabilities of the robot
Avoid code repetition
Streamline programming paradigm and patterns
Free developers from the nitty gritty of hardware controls so they can focus on being creative about usecases
The library is meant to encapsulate basic capabilities of the robot
Avoid code repetition
Streamline programming paradigm and patterns
Free developers from the nitty gritty of hardware controls so they can focus on being creative about usecases
Improving accuracyImproving accuracy of stt services is an ongoning effort, but this environment can be really noise so accurate speech models may not be enough.Gaol: Are there good ways to break down parts of an interaction such that we keep it natural but minimize errors
TJBot play smooth criminal by michael jackson from spotify
Play me some music
Sure, what song would you like?
Solution:
Intent based matching
Multi-turn conversation as a way to improve voice based accuracy
Intterrupt:
Goal: When and how should we interrupt the robot
Should we build in approaches closesly tied with humans - e.g an uncormfortable expression, a raised hand, frowning face etc ?
Solutions?
Hardware interrupt ?
Vision based interrupt ? (raised hand, frowning face)
Latency.
Can be a real problem, especially for a robot thats humanoid. He’s cute, but that will only take him so far.
A primary source of latency has to do with making calls to cognitiev services, and the goal is to arrive
Solution:
Tools to manage response arrival,
Leverage gestures (light, arm) to cue users
Balancing computation – how much can we put on the Pi itself without degrading performance?
Improving accuracyImproving accuracy of stt services is an ongoning effort, but this environment can be really noise so accurate speech models may not be enough.Gaol: Are there good ways to break down parts of an interaction such that we keep it natural but minimize errors
TJBot play smooth criminal by michael jackson from spotify
Play me some music
Sure, what song would you like?
Solution:
Intent based matching
Multi-turn conversation as a way to improve voice based accuracy
Intterrupt:
Goal: When and how should we interrupt the robot
Should we build in approaches closesly tied with humans - e.g an uncormfortable expression, a raised hand, frowning face etc ?
Solutions?
Hardware interrupt ?
Vision based interrupt ? (raised hand, frowning face)
Latency.
Can be a real problem, especially for a robot thats humanoid. He’s cute, but that will only take him so far.
A primary source of latency has to do with making calls to cognitiev services, and the goal is to arrive
Solution:
Tools to manage response arrival,
Leverage gestures (light, arm) to cue users
Balancing computation – how much can we put on the Pi itself without degrading performance?
Improving accuracyImproving accuracy of stt services is an ongoning effort, but this environment can be really noise so accurate speech models may not be enough.Gaol: Are there good ways to break down parts of an interaction such that we keep it natural but minimize errors
TJBot play smooth criminal by michael jackson from spotify
Play me some music
Sure, what song would you like?
Solution:
Intent based matching
Multi-turn conversation as a way to improve voice based accuracy
Intterrupt:
Goal: When and how should we interrupt the robot
Should we build in approaches closesly tied with humans - e.g an uncormfortable expression, a raised hand, frowning face etc ?
Solutions?
Hardware interrupt ?
Vision based interrupt ? (raised hand, frowning face)
Latency.
Can be a real problem, especially for a robot thats humanoid. He’s cute, but that will only take him so far.
A primary source of latency has to do with making calls to cognitiev services, and the goal is to arrive
Solution:
Tools to manage response arrival,
Leverage gestures (light, arm) to cue users
Balancing computation – how much can we put on the Pi itself without degrading performance?
The library is meant to encapsulate basic capabilities of the robot
Avoid code repetition
Streamline programming paradigm and patterns
Free developers from the nitty gritty of hardware controls so they can focus on being creative about usecases
Initial feedback has been postive .. (survey and twitter data)
Individuals appear to want one before they even know what it does. Engagement .. Check!
Wide availability – we have seen people download our 3d print and lasercut designs in places like south africa, pakistan, hong kong, chile, brazil, italy
We want to take this further … make TJBot available to more people
Solve some of the open problems and encapsulate what we learn in the TJBot library.
Build and sustain the TJBot community.