Patrick McKenzie (patio11) at TwilioConf 2011. Topic is about taking Twilio "beyond the quickstart guides": making applications which are secure, testable, maintainable, and appropriate for running businesses on top of.
3. Twilio Has The Power To Make You…
Sob softly at
3 AM in a
cold, wet, dar
4. How could I have avoided that?
Process: Do not push new code to production
at 5 PM on Friday night.
Process: Test on staging server first. Fail the
deploy if core features do not work as
expected.
Tech: Switch to idempotent queues.
Tech: How about we don‟t call the same
person 50 times in five minutes?
Tech: Activity spike 500x historical max = Shut.
Down. Everything.
6. Testing Pitfalls With Twilio
Testing is dangerous
Testing trivial changes often requires manual
work
Your view code (Twiml) will frequently blow up
business logic
Poor separation of concerns between
model, view, controller, Twilio libraries, and
Twilio API. Many classes of bugs not
exercised by automated testing
8. What To Test
Business logic, business logic, business logic
Scheduling calls / SMSes per business rules
Call flow
Am I calling Twilio API the way Twilio expects?
Twimllooks OK?
Parameters for requests passed correctly?
Does stuff actually work?
9. Don‟t Contact Twilio In Tests
Makes tests slow
Potentially dangerous
Bought numbers in unit test. Twilio.revenue +=
340
Hurts reproducability
Instead, record and playback (VCR gem, etc)
NotRuby? Use Twilio API explorer, copy/paste
response to mock.
10. Use localtunnel in development
Quicker than “FTP new version to site”
Won‟t break stuff for real customers
11. Staging Servers Are Required
Staging = Production – Customers
“Same” hardware, configurations, etc, different
Twilio numbers
Ban the Internet (except Twilio) from servers
Strongly recommend no real data in staging
DB
Staging servers good for automated test calls
12. Staging Servers Protect Production
Prior to pushing to production, push to staging.
Run a script to automatically drive website and
telephone, verifying that stuff actually works.
Fail deploy to production if anything goes
wrong.
Adds ~5 minutes to a deploy, will save you
outages, catastrophic blowups, and your
sanity.
16. Case Statements Considered Harmful
Easy to introduce subtle bugs
Very difficult to test
Requires manual testing (with a phone !?)
Tightly couples business logic w/ Twilio
Hard to maintain
Adding menu item => stuff breaks
Change a number => stuff breaks
Restructure flow => stuff breaks
19. What To Use State Machines For?
Call flows
Business logic testable (in model)
Forces similar organization on
model, view, controller, and vocal assets
SMS flows
Necessity for contact in the first place
Avoid easiest catastrophic failure mode with
Twilio
20. Specifics To Modeling Calls
Each call gets a DB/model object
Model tracks call state
Set state to “processing” prior to initiating call
(or at entrance to Twilio script for inbound)
Then, transition based on input, using each
transition to:
trigger
side-effects (updating DB, etc)
present user with view state (voice, etc)
22. Twilio‟s IfMachine = Continue
Wait until call recipient says something
If they don‟t say something, must be a machine.
If they do say something, maybe still a machine?
Error rates ~20% in my limited experience
23. Problems With
IfMachine=Continue
“I tried a test call to myself and it never started
talking. I‟m concerned my customers would
hang up before my message plays.”
If you don‟t pick up beep correctly, first several
seconds of message does not get recorded.
“My customers hit 1 and nothing happens.”
24. Other Options (Not Answers)
Give machines/humans the same message.
Give machines/humans the same
message, but force a keypress (“1”) prior to
talking. This coerces most answering
machines/voicemails into starting
recording, even early.
“This is an automated message from Your
Company Here. Press 1 to hear your
message.” <Gather> their input. If input, play
human message. If none, play answering
machine message.
25. Be Careful With Answering
Machines
Hit 5 To
Confirm Your
Appointment
30. Check Your Application For…
Application security issues
Unintended information disclosure
Catastrophic degradation during failure
conditions
The 4Chan Rule
31. Outgoing Call Security
Educate users regarding proper use.
This will require firing some of them.
Establish per-account, per-destination, and
global rate caps. Review manually after
triggers.
Have a global “Stop all outgoing calls” button.
33. Incoming Call Security
Caller IDs can be spoofed. Do not gate
important stuff on them.
“Thanks for calling our automated system. Put
in your task code to continue.”
Task code: 4~6 digit random ID. Expires in 1
hour. If possible, flush codes if > 3 failures in a
row.
Per-account call-in numbers when feasible.
Increases security and cuts down on support
costs.
35. One Commodity Server Has…
6 hours per working day
3,600 seconds per hour
~25 requests per second
~3 requests per 2 minute phone call
180,000
37. Why Rate Limit Then?
Control costs to your business and customer.
Protect customer from crushing their offline
processes which are feeding to/from the
phones.
“Great that it scales. By the way, can we get an
off button? To turn off calls for a few hours?”
“Why do you need an off button?”
“Our operators sometimes get called away from
their desks, for meetings and whatnot.”
“Certainly. How many operators do you have?”
“Two.”
39. Random Grabbag Of Advice
Never contact Twilio in request/response
cycle. Queue requests, use worker process.
Fiverr.com for voice actresses. Find one you
like, put her on retainer.
Record copious information about errors. Very
hard to get individualized “What did your
customer do to hear that unspecified
„Something broke‟ message?”
Fail closed: default to not making the call.
40. Thanks For Listening
http://www.kalzumeus.com
patrick@kalzumeus.com
I‟m patio11 on Twitter or HN.
I love talking about this. Feel free to get in
touch.