Game of Drones: Using IoT, Machine Learning, Drones, and Networking to Solve World Hunger
Drones are increasingly used in various commercial and consumer scenarios – from agriculture drones (providing farmers with crop and irrigation patterns) to consumer drones (that follow you around as you engage in action sports), to drone racing. Drones are outfitted with a large number of sensors (cameras, accelerometers, gyros, etc.), and can continuously stream these signals in real time for analysis.
This talk introduces the landscape of the various drone technologies that are currently available, and shows you how to acquire and analyze the real-time signals from the drones to design intelligent applications in an IoT pipeline. We will demonstrate how to leverage machine learning models that perform real-time facial detection along with predictions of age, gender, emotion, and object recognition using the signals acquired from the drones. You will walk away understanding the basics of how to develop applications that utilize and visualize these real-time insights.
This talk includes fun with drones, how to tackle the problem of world hunger, and some Game of Thrones silliness. It is targeted at data scientists, students, researchers, and IT professionals who have an interest in building intelligent applications using drones and machine learning. It will be a fun and exciting exploration as we demonstrate a drone with the power of recognizing faces, ages, genders, emotions, and objects. You will learn how to leverage these same machine learning models to imbue intelligence into drones or other applications.
7. 2000 acres in upstate NY
Horticulture, animal farming,
dairy, etc.
100 acres of farm in Carnation, WA
Rented out to small farmers
- Primarily horticulture
11. Fusing it all together
Spatio-temporal
view of the
farm
Sensors &
UAVs
Yield estimation Precision Irrigation Pest Infection
Fertilizer
application
…Ag Services
12. probabilistic graphical models that
embed Gaussian processes is used to
extrapolate from the sensor data points
to the full territory. This model seeks to
balance spatial and visual smoothness:
are measuring physical properties of the soil and the environment, the
sensor readings for locations that are nearby should be similar (spatial smoothness).
• Areas that look similar should have similar sensor values. For example, a recently
irrigated area has more moisture and hence looks darker (visual smoothness).
15. Drone Auto-pilot App
Point selection mode
Features
Simple user interface
Supports two flight modes
o point-selection mode
o area-selection mode
Estimates flight time
Stores path history and telemetry
Transfers video from drone to IoT
edge
Supports DJI Phantom2 and Inspire 1
Area selection mode
18. Area Coverage Algorithm
Drone stops and
goes or reduce its
speed at each
waypoint.
8 waypoints
16 waypoints
1 2
4 3
5 6
8 7
1
2
4
3
5
6
8
7 10
11
9
12
13
14
15
16
19. Omni-directional
Directional
◦ Front – low drag
◦ Side – high drag
Front side
Front side
Omni-directional
drones can have
directionality by
attaching a paper
here
20. Yaw Control Algorithm
Given a path and wind information, the drone changes its
yaw in order to save energy.
Exploiting wind for
acceleration
Exploiting wind for
deceleration
Avoiding wind for
acceleration
Exploiting drag for
deceleration
Exploiting wind for
acceleration
Exploiting drag for
deceleration
Avoid drag
Avoid drag
22. Snapshot of available wireless technologies
Range
Throughput
*Not to scale, Very rough estimates
HaLow
3G, LTE
NB-IoT
WiMax, TVWS
Cat0
23. dbm
Frequency
-60
-100
“White spaces”
470 MHz 700 MHz
What are TV White Spaces?
25
0
MHz
7000
MHz
TV
ISM (Wi-
Fi)
698470 2400 51802500 5845
are Unoccupied TV ChannelsWhite Spaces
54-88 170-216
Wireless
Mic
24. Mawingu Project
Collaboration between Kenya’s
Ministry of Information and
Communications, Microsoft, and
Mawingu Networks.
Pilot delivering low-cost wireless
broadband access to previously
unserved locations near Nanyuki.
To maximize coverage and bandwidth, while keeping costs to a
minimum, the Mawingu network relies on a combination of “license-
exempt” wireless technologies, including Wi-Fi and TVWS.
First deployment of solar-powered
based stations together with TVWS
to deliver high-speed Internet
access to areas currently lacking
even basic electricity. Base stations
allow end-users to charge devices.
25.
26. Roll your own with REST APIs
Simple to add: just a few lines of
code required
Integrate into the language and
platform of your choice
Breadth of offerings helps you find the
right API for your app
Built by experts in their field from
Microsoft Research, Bing, and Azure
Machine Learning
Quality documentation, sample
code, and community support
Easy Flexible Tested
GET A
KEY
28. Computer Vision API
Distill actionable
information from
images
Video API
Analyze, edit, and
process videos within
your app
Face API
Detect, identify,
analyze, organize, and
tag faces in photos
Emotion API
Personalize
experiences with
emotion recognition
Vision
29. How do I use them?
POST https://api.projectoxford.ai/vision/v1.0/analyze?visualFeatures=Description,Tags
&subscription-key=<Your subscription key>
{
"tags": [
{ "name": "outdoor",
"score": 0.976 },
{ "name": "bird",
"score": 0.95 } ],
"description":
{ "tags":
[ "outdoor", "bird" ],
"captions": [
{ "text": "partridge
in a pear tree",
"confidence": 0.96 }
]
}
}
Use HDMI for audio output
Turn volume up
Start VS
Start AR app
Connect to drone
1:30-2:20pm
_____________________________
Drones are increasingly used in various commercial and consumer scenarios – from agriculture drones (providing farmers with crop and irrigation patterns) to consumer drones (that follow you around as you engage in action sports), to drone racing. Drones are outfitted with a large number of sensors (cameras, accelerometers, gyros, etc.), and can continuously stream these signals in real time for analysis.
This talk introduces the landscape of the various drone technologies that are currently available, and shows you how to acquire and analyze the real-time signals from the drones to design intelligent applications. We will demonstrate how to leverage machine learning models that perform real-time facial detection along with predictions of age, gender, emotion, and object recognition using the signals acquired from the drones. You will walk away understanding the basics of how to develop applications that utilize and visualize these real-time insights.
This talk is targeted at data scientists, students, researchers, and IT professionals who have an interest in building intelligent applications using drones and machine learning. It will be a fun and exciting exploration as we demonstrate a drone with the power of recognizing faces, ages, genders, emotions, and objects. You will learn how to leverage these same machine learning models to imbue intelligence into drones or other applications.
This slide is required. Do NOT delete. This should be the first slide after your Title Slide.
This slide should describe what your goals are for this session. This information lets your audience know what you are trying to accomplish with your talk or tutorial—ie, what value will attendees get by investing 25 minutes or 2 hours of their time listening to you.
You should not spend more than 1 minute presenting this slide.
General examples of session goals could be (you will have to create your own specific goals):
Introduce a new technique or approach to solve a customer problem
Compare two approaches and explain why one is superior
Describe a project and the learnings that audience members can apply from it
Teach audience members how to use a specific technology
In Mexico City, Uber ads are delivered by drones. Picture from:
https://www.technologyreview.com/s/602662/ubers-ad-toting-drones-are-heckling-drivers-stuck-in-traffic/
Food production needs to double by 2050 to feed the world’s growing population
Food production needs to double by 2050 to feed the world’s growing population
Source: http://www.un.org/press/en/2009/gaef3242.doc.htm
It also turn out that in order to precision agriculture, you needs lots of sensors
Sensors are expensive….
Many of the existing systems work out at $1,000 a sensor.
That is too pricey for most rich-world farmers, let alone those in poor countries where productivity gains are most needed.
The sensors themselves, which probe things like moisture, temperature and acidity in the soil, and which are scattered all over the farm, are fairly cheap, and can be powered with inexpensive solar panels.
The cost comes in getting data from sensor to farmer. Few rural farms enjoy perfect mobile-phone coverage, and Wi-Fi networks do not have the range to cover entire fields.
So most precision-agriculture systems rely on sensors that connect to custom cellular base stations, which can cost tens of thousands of dollars, or to satellites, which require pricey antennas and data plans.
Two setups implementing this
Another interesting problem that our farmer friends tell us is about weeds. So, farmers pay people to walk around the farm in a zigzag fashion and click photos. Now, we have the ability to fly low and create interesting views. Not only that, we can exactly tell the farmer where the drone is looking, so that he doesn’t have to spend so much money anymore. Of course, the next step is to do automated weed detection, but we aren’t there yet.
We can zoom in and see that the details are correct. You can now see the animals grazing and the grass. If I zoom in more, I can even see some cow shit here.
This is a picture from a farm in New York, from a farmer, who farms about a thousand acres. He really wants to know how his cattle graze the fields and if he should take them to a different area tomorrow. If you look at this picture, it is quite clear. The area here looks pretty barren, the area here looks so green. In fact, you can look at cow shit and see if there is enough cow shit at a place, cattle have probably have had enough.
From https://blog.acolyer.org/2017/04/25/farmbeats-an-iot-platform-for-data-driven-agriculture:
Given the orthomosaic and the sensor readings, the final challenge is to create a precision agriculture maps for the whole farm. For example, moisture maps, pH maps, and temperature maps.
A machine learning model based on probabilistic graphical models that embed Guassian processes is used to extrapolate from the sensor data points to the full territory. This model seeks to balance spatial and visual smoothness:
Since we are measuring physical properties of the soil and the environment, the sensor readings for locations that are nearby should be similar (spatial smoothness).
Areas that look similar should have similar sensor values. For example, a recently irrigated area has more moisture and hence looks darker (visual smoothness).
ON THE Dancing Crow farm in Washington, sunflowers and squashes soak up the rich autumn sunshine beside a row of solar panels. This bucolic smallholding provides organic vegetables to the farmers' markets of Seattle. But it is also home to an experiment by Microsoft, a big computing firm, that it hopes will transform agriculture further afield. For the past year, the firm's engineers have been developing a suite of technologies there to slash the cost of "precision agriculture", which aims to use sensors and clever algorithms to deliver water, fertilisers and pesticides only to crops that actually need them.
Precision agriculture is one of the technologies that could help to feed a world whose population is forecast to hit almost 10 billion by 2050. If farmers can irrigate only when necessary, and avoid excessive pesticide use, they should be able to save money and boost their output.
But existing systems work out at $1,000 a sensor. That is too pricey for most rich-world farmers, let alone those in poor countries where productivity gains are most needed. The sensors themselves, which probe things like moisture, temperature and acidity in the soil, and which are scattered all over the farm, are fairly cheap, and can be powered with inexpensive solar panels. The cost comes in getting data from sensor to farmer. Few rural farms enjoy perfect mobile-phone coverage, and Wi-Fi networks do not have the range to cover entire fields. So most precision-agriculture systems rely on sensors that connect to custom cellular base stations, which can cost tens of thousands of dollars, or to satellites, which require pricey antennas and data plans.
In contrast, the sensors at Dancing Crow employ unoccupied slices of the UHF and VHF radio frequencies used for TV broadcasts, slotting data between channels. Many countries are experimenting with this so-called "white space"; to unlock extra bandwidth for mobile phones. In cities, tiny slices of the white-space spectrum sell for millions of dollars. But in the sparsely populated countryside, says Ranveer Chandra, a Microsoft researcher, there is unlicensed space galore.
The farmer's house is connected to the internet in the usual way. A special white-space base station relays that signal to a shed elsewhere on the farm that sports an ordinary TV aerial. Individual sensors talk to the shed using TV transceivers with a range of more than 8km—enough for all but the biggest farms. And those transceivers are cheap: "We've already built sensors for less than $100," says Mr Chandra. "Our aim is to get them to under $15."
Microsoft is not the only organisation hoping to make agricultural sensors practical. Researchers at the University of Applied Sciences in Mannheim, for instance, have developed a sensor network that relies on a technology called software-defined radio, which uses computers to simulate an ultra-flexible, very sensitive radio receiver. And scientists at the University of Nebraska-Lincoln are working on sensors that communicate with radio waves that propagate through the soil rather than the air, and which draw their power from the vibrations generated by farm vehicles moving about on the surface.
But although such sensor data are useful, but they cannot tell you everything. To fill in the gaps, Dancing Crow uses a drone. These are getting cheaper (a basic model costs $1,000) but they require some skill to fly, and their small batteries mean limited flight times. So Microsoft's team wrote an autopilot that lets a farmer outline a plot to survey, works out the most efficient route and sends the drone on its way, reducing the time taken to cover a farm by over 25%.
The resulting imagery contains useful information on growing conditions, crop health and insect pests, but interpreting it properly is beyond most farmers. So Microsoft also developed software that runs on an ordinary laptop, and can stitch together individual pictures into a single panoramic view of the entire farm. Sensor data can be laid atop this view, and the computer can then extrapolate a handful of sensor readings into predicted values for moisture, acidity and so on at any given point.
When the nearby Snoqualmie River rises up to flood Dancing Crow farm in a couple of months, as it does most winters, Mr Chandra plans to take his technologies to India. For the very poorest farmers, even a cheap drone will be beyond their budget. He wants to see if a lower-tech solution will work just as well—simply attaching a smartphone to a $5 helium balloon and walking it through the fields.
First of all, it provides a simple user interface.
If a user sets waypoints or an area of interest and then pushes the start button, the drone starts to fly. After the drone completes its mission, it returns to its home position. This app provides other features like flight time estimation, storing path history and telemetry and transferring video from the drone to IoT edge.
Let’s look at the demo. This is drone.
The app asks if it stores the new path. As you know, the monitoring job is repeated every day or every week.
If the new path is stored, the user does not need to set the path again. He can just select the path from the history.
In the area coverage mode, the drone covers the area in lawn-mower sweeping pattern.
Since the drone stops and goes at every waypoint, it is important to reduce the number of waypoints.
Let’s look at the examples. In the first figure, the path has only 8 waypoints while the path in the second figure has 16 waypoints. It is obvious that the first one is better than the second one.
There are two drone types.
In case of an omni-directional drone, its front and side look similar and have similar air resistance. However, In case of a directional drone, its front generates low drag while its side generates high drag.
This figure illustrates our yaw control algorithm.
If wind blows from here, at first, the drone exploits wind for its acceleration. While it goes forward, it tries to minimize the drag.
If the drone wants to stop, it changes its yaw like this to exploit drag for deceleration. Likewise, if the drone wants to accelerate, it changes its yaw like this to avoid wind.
Microsoft Research was amongst the first to:
Build TV white space radios
Design WhiteFi, a Wi-Fi like protocol for TVWS
Demo the world’s first urban WhiteFi network using geolocation DB on MSFT campus in 2009
Turns out that as a result of this work. The chief minister of Andhra Pradesh reached out to Microsoft on how they can leverage this to transform agriculture….
Why choose these APIs? They work, and it’s easy.
Easy: The APIs are easy to implement because of the simple REST calls. Being REST APIs, there’s a common way to implement and you can get started with all of them for free simply by going to one place, one website, www.microsoft.com/cognitive. (You don’t have to hunt around to different places.)
Flexible: We’ve got a breadth of intelligence and knowledge APIs so developers will be able to find what intelligence feature they need; and importantly, they all work on whatever language, framework, or platform developers choose. So, devs can integrated into their apps—iOS, Android, Windows—using their own tools they know and love (such as python or node.js, etc.).
Tested: Tap into an ever-growing collection of powerful AI algorithms developed by experts. Developers can trust the quality and expertise build into each API by experts in their field from Microsoft’s Research organization, Bing, and Azure machine learning and these capabilities are used across many Microsoft first party products such as Cortana, Bing and Skype.
What are Cognitive Services? Microsoft Cognitive Services are a new collection of intelligence and knowledge APIs that enable developers to ultimately build smarter apps.
NOTES: key concepts we are trying to convey in this above statement:
That we are bringing together Intelligence (Oxford) and Knowledge from the corpus of the web (Bing)
That cognitive = human perception and understanding, enabling your apps to see the world around them, to hear and talk back with the users—to have a human side.
What are Microsoft Cognitive Services?
Microsoft Cognitive Services is a new collection of intelligent APIs that allow systems to see, hear, speak, understand and interpret our needs using natural methods of communication. Developers can use these APIs to make their applications more intelligent, engaging and discoverable. To try Cognitive Services for free, visit www.microsoft.com/cognitive.
With Cognitive Services, developers can easily add intelligent features – such as emotion and sentiment detection, vision and speech recognition, knowledge, search and language understanding – into their applications. The collection will continuously improve, adding new APIs and updating existing ones.
Cognitive Services includes:
Vision: From faces to feelings, allow apps to understand images and video
Speech: Hear and speak to users by filtering noise, identifying speakers, and understanding intent
Language: Process text and learn how to recognize what users want
Knowledge: Tap into rich knowledge amassed from the web, academia, or your own data
Search: Access billions of web pages, images, videos, and news with the power of Bing APIs
Vision
Computer Vision API: as a free trial on the website microsoft.com/cognitive. There are also SDKs and Samples available on GitHub or through NuGet, Maven, and Cocoapods for select platforms to make development easier. It’s important to note here that it’s not client side running code, but light wrappers around the REST calls to make integration easy.
A photo app would use this as a way to tag user photos and make it easier for users to search through their collections. An assistive app would use this as a way to describe the surroundings to visually-impaired users. Works really well on both indoor or outdoor images; it can recognize common household objects, and it can describe outdoor scenes. However, we did not train on aerial images (say from drones), or on many close ups (so pictures where we zoomed in extremely on the subject won't do well). We also do really well recognizing celebrities (as long as most of the face is visible, and they were facing the camera).
Face API: Some potential uses for this technology include facial login, photo tagging, and home monitoring. Or attribute detection to know age, gender, facial hair, etc.
Emotion API: is available in the Azure marketplace, as a free trial on the website microsoft.com/cognitive. See Computer Vision description.
Build an app that responds to moods. Using facial expressions, this cloud-based API can detect happiness, neutrality, sadness, contempt, anger, disgust, fear, and surprise. The AI understands these emotions based on universal facial expressions, and it functions cross-culturally, so your app will work around the world. Some use cases would be an advertising company wants to test user response to an ad, a tv studio wants to track responses to a pilot.
Video API: as a free trial on the website microsoft.com/cognitive. See Computer Vision description.
It brings Microsoft state of the art video processing algorithms to developers. With Video API, developers can analyze and automatically edit videos, including stabilize videos, create motion thumbnails, track faces, and detect motion. Use cases: For Stabilization: If you have multiple action videos, you can use the stabilization algorithm to make them less shaky and easier to watch. You can also use the stabilization algorithm as a first step in performing other video APIs. For Face Tracking: You can track faces in a video to do A/B testing in a retail setting. You can combine Video API Face Tracking with capabilities in Face API to search through surveillance, crime, or media footage to look for certain person. For Motion Detection: Instead of having to watch long clips of surveillance footage, the API will let you know what time motion occurred and its duration. For Video Thumbnail: Take a long video, such as a keynote presentation, and automatically create a short preview clip of the talk. For Face Tracking: Works best for frontal faces. Currently cannot detect small faces, side or partial faces. For Motion Detection: Detects motion on a stationary background (e.g. fixed camera). Current limitations of the algorithms include night vision videos, semi-transparent objects, and small objects. For Video Thumbnail: Take a long video, such as a keynote presentation, and automatically create a short preview clip of the talk.
This example is calling Vision API to get tags and description.
https://www.microsoft.com/cognitive-services/en-us/computer-vision-api/documentation/HowToCallVisionAPI
POST https://api.projectoxford.ai/vision/v1.0/analyze?visualFeatures=Description,Tags&subscription-key=<Your subscription key>
Security: can do this without subscription key in the URL (in header instead)
Click on links and show this data from the website
Facial detection constraints and excellent documentation at https://dev.projectoxford.ai/docs/services/563879b61984550e40cbbe8d/operations/563879b61984550f30395236
DGI drone: also connect via SSID. Phone talks over WiFi to controller which uses proprietary protocol to communicate with drone
LTE/4G data connection and WiFi is one workaround for phone
Images can be huge which takes time and costs money