After buying a set of Sonos-compatible speakers at IKEA, I was disappointed there's no support for playing audio from a popular video streaming service. They stream Internet radio, podcasts and what not. Well, not that service I want it to play!
Determined - and not knowing how deep the rabbit hole would be - I ventured on a trip that included network sniffing on my access point, learning about UPnP and running a web server on my phone (without knowing how to write anything Android), learning how MP4 audio is packaged (and has to be re-packaged). This ultimately resulted in an Android app for personal use, which does what I initially wanted: play audio from that popular video streaming service on Sonos.
Join me for this story about an adventure that has no practical use, probably violates Terms of Service, but was fun to build!
Generative AI for Technical Writer or Information Developers
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos speaker
1. Nerd sniping myself into a rabbit hole...
Streaming online audio to a Sonos
speaker
Maarten Balliauw
@maartenballiauw
2. Disclaimer
I will share bits of source code where they matter, but will
not be sharing the full application.
I have built this application for personal and learning use,
and I do not intend to share it.
Don’t ask, the answer is no.
4. January 2020
“Let’s replace our old speakers with new and shiny!”
Requirements:
“Smart” speakers that can stream from the Internet
2 for living room, 1-2 for home office
5.
6.
7.
8.
9. Now what…
Searched online for solutions…
…all I found was excuses.
Legal, patents, … Don’t really care as a consumer!
12. Nerd Sniping
1. The act of presenting someone, often
a mathematician/physicist with a time consuming problem or
challenge (often impossible to solve or complete) in the hopes
of it appealing to a person's obsessive tendencies.
Urban Dictionary
And https://xkcd.com/356
16. Connect to speakers
Connect to one speaker
Play an MP3 from webserver
This seems very, very promising!
#!/usr/bin/env python
from soco import SoCo
if __name__ == '__main__’:
sonos = SoCo('192.168.1.102’)
sonos.play_uri('http://host/file.mp3')
https://python-soco.com/
17. Get MP4 URL of online video
Video metadata endpoint, used by web player
Returns urlencoded data about video (with JSON sprinkled in)
https://www.youtube.com/watch?v=-zJoP2qPgTg
https://www.youtube.com/get_video_info?video_id=-zJoP2qPgTg
18. Get MP4 URL of online video
{
"responseContext":{ },
"playabilityStatus":{ },
"streamingData":{
"expiresInSeconds":"21540",
"formats":[
{
"itag":18,
"url":"https://r3---sn-uxaxoxu-cg0k.googlevideo.com/videoplayback?expire=1599574779u0026ei=mz5XX6-fKdOIgQeBp6PgDwu0026
"mimeType":"video/mp4;+codecs="avc1.42001E,+mp4a.40.2"",
"bitrate":234221,
"width":640,
"height":360,
"lastModified":"1586173729658545",
"contentLength":"92057863",
"quality":"medium",
19. Get MP4 URL of online video
More reading by Alexey Golub
https://tyrrrz.me/blog/reverse-engineering-youtube
Signed videos
Video player JS code contains decryption routine as JavaScript
Need to evaluate that to be able to access video (or Regex the cipher)
Too much hassle to write manually!
There exist scripts & libraries in many programing languages
In summary: we have the MP4 URL now.
20. Send URL to speakers
#!/usr/bin/env python
from soco import SoCo
if __name__ == '__main__’:
sonos = SoCo('192.168.1.102’)
sonos.play_uri('http://host/file.mp4')
https://python-soco.com/
Expected:
Actual:
21.
22. Side track: speaker webserver
Anything useful to find?
http://192.168.1.123:1400/status
http://192.168.1.123:1400/support/review
http://192.168.1.123:1400/tools.htm
https://bsteiner.info/articles/hidden-sonos-interface
24. 💡 Maybe it’s that SoCo library!
“Because maybe 65 contributors have it wrong!”
The official application can send a stream to the speakers...
...can I listen on the network and see what the request looks like?
25. Sniffing the network
WireShark https://www.wireshark.org/
Sniff traffic that passes your computer’s network adapter
Traffic does not pass my computer :-/
Phone on wifi, speaker on wifi, computer on wifi – huh?
Turns out access point does just send all traffic to all devices
💡💡 Unifi access point is *nix
tcpdump there?
26. Sniffing the network
🤓 On my Windows box, connected to wired network
Run Ubuntu
SSH into access point and run tcpdump
Pipe data back to Windows
Access point IP Capture ethernet side From/to my phone IP
30. Replaying SOAP payload
Tried MP3 URLs and MP4 URLs
MP3 worked, MP4 did not
The SoCo library did not have any issues...
Searched around for DIDL-Lite in payload
Seems speakers use good old UPnP
http://www.upnp.org/schemas/av/didl-lite-v2.xsd
31. 💡 Maybe it’s the MP4 format!
https://support.sonos.com/s/article/79?language=en_US
32. Our (potential) options...
Download MP4, push it to speaker as a local file
or
Proxy MP4 and do on-the-fly transcoding to MP3
Send MP3 URL as “Internet Radio”
or
Investigate MP4 and see if they indeed use AAC
Send AAC URL as “Internet Radio”
34. MP4
MPEG-4 Part 14 or MP4 is a digital multimedia container
format most commonly used to store video and audio, but
it can also be used to store other data such
as subtitles and still images. (…) allows streaming (…)
Wikipedia
Can we extract this to a separate file?
MP4 file
Header
Video 1
Video N
Audio 1
Audio N
Subtitles
MP4 file (optimized for streaming)
Header
Video 1 (short)
Audio 1 (short)
Video N (short)
Audio N (short)
35. FFMpeg to the rescue!
“A complete, cross-platform solution to record, convert and stream
audio and video.”
Swiss army knife for video/audio formats.
ffmpeg -i original.mp4 -c:a copy output-aac.m4a
Extracts the audio track from MP4 container
Use SoCo to send file to speakers.
https://ffmpeg.org/
Expected: Actual:
36.
37. deadf00d
“How I hacked Sonos and YouTube in the same day.”
https://www.deadf00d.com/post/how-i-hacked-sonos-and-youtube-the-same-day.html
@deadf0od - https://twitter.com/deadf0od
“HEY, kAn 1 Dm J00?
w0rK1n' 0N 51M1Lar 7h1n' AnD wE M19h7 8E a8le 70 HElP EaCH 07heR.”
...
“It’s AAC, but in ADTS format. Each atom needs a header in every frame!”
38.
39. MP4 to AAC with ADTS
ffmpeg -i original.mp4 -acodec copy
-f adts -vn output-adts.aac
Extracts the audio track from MP4 container
Adds ADTS headers
Use SoCo to send file to speakers.
Expected: Actual:
41. Let the app building start!
1. Connect to speakers ✅
2. Get MP4 URL of online video ✅
3. (new) Extract MP4 audio track to ADTS ✅
4. Send URL to speakers ✅
5. Enjoy music!
47. Research…
Android-specific
Activity (main screen to handle everything)
Intent (ACTION_SEND to receive data from others)
Libraries (or code)
YouTube metadata extractor (get audio URL)
Sonos communication library (discover speakers, send URL to speakers)
Webserver (reverse proxy)
Something to do the MP4 to AAC (ADTS) conversion
49. And checking the intent…
class MainActivity : CoroutineScope, AppCompatActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
val url = intent?.extras?.getString(Intent.EXTRA_TEXT)
50.
51. SSDP Simple Device Discovery Protocol
Used to discover printers, routers, Sonos, ...
Send UDP datagram as multicast/broadcast
M-SEARCH * HTTP/1.1
HOST: 239.255.255.250:1900
MAN: "ssdp:discover"
MX: 1
ST: urn:schemas-upnp-org:device:ZonePlayer:1
Devices send back HTTP response as UDP
Supported services, endpoint URL, ...
https://en.wikipedia.org/wiki/Simple_Service_Discovery_Protocol
https://github.com/vmichalak/ssdp-client
52. Android – Running webserver
Prevent server stop when application is closed or device goes to sleep
<service
android:name=".RunEmbeddedWebServerService"
android:enabled="true"
android:exported="true" /> class RunEmbeddedWebServerService : CoroutineScope, Service() {
private val server = embeddedServer(Netty, 36362) {
routing {
get("/{videoId}.mp4") { /* ... */ }
}
}
override fun onStartCommand(...): Int {
server.start(wait = false)
return START_NOT_STICKY
}
override fun onDestroy() {
server.stop(0, 1000)
super.onDestroy()
}
https://developer.android.com/guide/components/services#Types-of-services
63. In summary…
Learned a lot of random things along the way ✅
There is so much knowledge out there!
Talk to people (thanks, deadF00d!)
You can build anything!
Even if it seems impossible at first!
Wife and I built house, set of old PC speakers. Time for a replacement!
Mostly play streaming music, Spoify, TuneIn, SoundCloud, YouTube, ...
Requirements: smart, so that works
2 for living room, 1 or 2 for home office, and multiroom would be even better!
Unfortunately, it being January, everything out there was full-price. Not that we are cheap, but in terms of value proposition we do find 2k for some speakers a bit skewed.
But then, we passed an IKEA store, and saw... Symfonisk.
They are a colaboration with Sonos. Essentially, Sonos speakers, rumour has that it’s the same hardware as that Sonos from earlier, at 1/5 the price.
Same software, 100% compatible with that Sonos stuff.
And as you can see on the picture, I can even put my glasses on top of it.
Listened to it in the store, and seemed fine. DEAL! We walked out with those speakers.
So we took them home (click)
Read the manual (click)
Followed all instructions, so we took a seat (click)
And installed, the app!
It was brilliant. We were able to tune a local radio station on TuneIn, push a playlist from Spotify to those speakers, and listen downstairs and upstairs.
We were almost in love with this new setup. SO GOOD.
There’s this video streaming site that lets me cast videos to my TV, and very often there is good music to be found as well.
This is where the honeymoon phase with our new speakers ended.
There is no support to cast video, or at least the audio of a video, to our smart speakers. NOT VERY SMART!
Now what? And yes this slide looks bad. But this was our feeling!
Searched online, but all I could fine were excuses. These two companies are fighting patents, and generally not playing nice together.
I don’t care as a consumer!
But no way around it, this is what it was going to be.
In one of my searches, I did find an app which seemed to sort of do what I was after.
Except, yet another party to give permissions to. Why can’t I just use the “Share” button in that other app?
But then I found this page on the Sonos website.
Supported audio formats: MP4.
And that streaming website is MP4.
How hard can it be to push the video/audio URL to the device and be done with it?
Mention colleague usually snipes, although I have become proficient at sniping myself. Whih is not ideal.
Puting those steps in an app should be easy once the individual steps work.
So a few Google searches later, I found this Python library, SoCo.
It supports writing Python, or use command line to do things like discover speakers, change volume, play music, ...
This seemed great! So I went of and installed a Python environment.
That did not work...
Actual is wrong, it did say “pop” about a second after a send.
HUH? Now what?
Check some Sonos web UI, do some pinging/traceroute/...
Irony: finding a license free picture of a rabbit hole was quite the rabbit hole in itself.
The SoCo library did not have any issues, it seems. Who would have thought.
UPnP! Not sure what that knowledge brings, but does mean there migh be more documentation out there.
MP4 does not support streaming... Only local library.
Should I sniff payloads to se what is sent if I play MP4 from my phone library?
Seems like good fallbak in case needed, but ideally don’t want to download the full video first, then upload to the speaker. I want close to instant!
Maybe I could transcode MP4 to MP3 on the fly? Rabbit hole is deep enough as it is...
AAC does seem to work for Internet radio. And AAC is the audio compression format used in many MP4 files... Could this be an option?
Let’s talk a bit about containers.
Explain container format. Analogy: a ZIP file (but it’s not a ZIP file).
Frustration. But, we’re now so deep in the rabbit hole, this SHOULD work, right? RIGHT????
What do developers do when something does not work? Google!
All sites I found had “visited” color, damned! But on page 25 of some search I did, I found somthing...
I found a hacker! Who was investigating the same stuff... Since then, he did elaborate on the entire process, but back then he was around the same stage as I was in this investigation.
“hey, can i dm you? working on similar thing and we might be able to help each other.”
We startd chatting, and at some point he says to me:
“It’s AAC, but in ADTS format. Each atom needs a header in every frame!”
WAT?
All we need now is an app!
Also, I know NOTHING about Android development
Went wth Android Studio, at least I know the general workings of the IDE, as it’s the same base IDE as IntelliJ and Rider.
Started with empty activity. It’s debatable, but I prefer clean templates that I can add incremental things to, as opposed to full-blown templates that I have no idea what they are doing.
Could have picked Java, but chose Kotlin. It’s the de-facto Android language nowadays, and it’s very similar to C# - my “mother tongue” in programming languages, so to speak.
Also fully compatible with Java, so can use any libraries out there and whn needed, even mix languages in one project.
NOW WHAT?
Decided to start with the most important thing...
Behold, the UI design! This is also the final design of the app.
Activity already present, added the intent filter to accept URL data with https and something that looks like YouTube.
Based on a null check, we can do other things. So with that out of the way, we can start building!
Explain coroutine scope, it’s to enable async/await like features in Kotlin for this class.
Went to packages tab (package search plugin), and started adding random things that matched what I was after.
Ktor I knew from my colleague, who is building it out, and is a webserver/client framework.
Turns out there are many packages that can help run FFMpeg, even on android!
They are wrappers, and “sort of” run command line.
Which means we would have to download MP4 first, convert to ADTS, then stream to device.
Workable for 2-3 minute songs, but for a 1h30 DJ set that is 250 MB, it’s not ideal. We’re also messing up temporary storage on the device and all.
I want close to zero delay between playing and the stream starting!
Explain MP4 is a set of boxes, lots of pointers. But we will try to better visualize on the next slide.
Explain logic of injecting ADTS header for each frame
Start with AndroidManifest.xml, mention more than we saw on slides. For example, permissions.
Trial and error, based on what I was doing Android would throw an exception telling me which permission I was missing.
MainActivity - onCreate
Extract YouTube URL
If found, discover Sonos devices - discoverSonosDevices()
Explain withContext(Dispatchers.IO) { - run this on IO thread
Discover using multicast
Discover using broadcast
Look at code for those
If devices found, prompt using AlertDialog builder
When device selected, trySetupStreamingFor
Get YouTube metadata
Generate video URL on local IP address (wifi only)
Extract album art and all
device.playUri(....) using that SOAP request from earlier
onCreate also started a webserver, which runs as a foreground service
Needed to make sure our server keeps running even if we’re doing other stuff on our phone
RunEmbeddedWebServerService
get("/{videoId}.mp4")
Extract metadata again, get MP4 audio URL
Run MP4AacToAdtsAacConverter on the MP4 URL stream, using AacAdtsWriter
MP4AacToAdtsAacConverter is something I had to cook up. Lots of trial and error, and deadF00d helped with insights
Parse boxes, read header, then skip to samples and push those out with ADTS header each time
AacAdtsWriter - go through the byteshifting...
I recorded a demo back in january, but I think I look a bit tired there. So lt’s do that aga