Patrick McKenzie (patio11) at TwilioConf 2011. Topic is about taking Twilio "beyond the quickstart guides": making applications which are secure, testable, maintainable, and appropriate for running businesses on top of.
How could I have avoided
that? Process: Do not push new code to production at 5 PM on Friday night. Process: Test on staging server first. Fail the deploy if core features do not work as expected. Tech: Switch to idempotent queues. Tech: How about we don‟t call the same person 50 times in five minutes? Tech: Activity spike 500x historical max = Shut. Down. Everything.
Testing Pitfalls With Twilio Testing
is dangerous Testing trivial changes often requires manual work Your view code (Twiml) will frequently blow up business logic Poor separation of concerns between model, view, controller, Twilio libraries, and Twilio API. Many classes of bugs not exercised by automated testing
What To Test Business logic,
business logic, business logic Scheduling calls / SMSes per business rules Call flow Am I calling Twilio API the way Twilio expects? Twimllooks OK? Parameters for requests passed correctly? Does stuff actually work?
Don‟t Contact Twilio In Tests
Makes tests slow Potentially dangerous Bought numbers in unit test. Twilio.revenue += 340 Hurts reproducability Instead, record and playback (VCR gem, etc) NotRuby? Use Twilio API explorer, copy/paste response to mock.
Staging Servers Are Required Staging
= Production – Customers “Same” hardware, configurations, etc, different Twilio numbers Ban the Internet (except Twilio) from servers Strongly recommend no real data in staging DB Staging servers good for automated test calls
Staging Servers Protect Production Prior
to pushing to production, push to staging. Run a script to automatically drive website and telephone, verifying that stuff actually works. Fail deploy to production if anything goes wrong. Adds ~5 minutes to a deploy, will save you outages, catastrophic blowups, and your sanity.
Case Statements Considered Harmful Easy
to introduce subtle bugs Very difficult to test Requires manual testing (with a phone !?) Tightly couples business logic w/ Twilio Hard to maintain Adding menu item => stuff breaks Change a number => stuff breaks Restructure flow => stuff breaks
What To Use State Machines
For? Call flows Business logic testable (in model) Forces similar organization on model, view, controller, and vocal assets SMS flows Necessity for contact in the first place Avoid easiest catastrophic failure mode with Twilio
Specifics To Modeling Calls Each
call gets a DB/model object Model tracks call state Set state to “processing” prior to initiating call (or at entrance to Twilio script for inbound) Then, transition based on input, using each transition to: trigger side-effects (updating DB, etc) present user with view state (voice, etc)
Twilio‟s IfMachine = Continue Wait
until call recipient says something If they don‟t say something, must be a machine. If they do say something, maybe still a machine? Error rates ~20% in my limited experience
Problems WithIfMachine=Continue “I tried a
test call to myself and it never started talking. I‟m concerned my customers would hang up before my message plays.” If you don‟t pick up beep correctly, first several seconds of message does not get recorded. “My customers hit 1 and nothing happens.”
Other Options (Not Answers) Give
machines/humans the same message. Give machines/humans the same message, but force a keypress (“1”) prior to talking. This coerces most answering machines/voicemails into starting recording, even early. “This is an automated message from Your Company Here. Press 1 to hear your message.” <Gather> their input. If input, play human message. If none, play answering machine message.
Outgoing Call Security Educate users
regarding proper use. This will require firing some of them. Establish per-account, per-destination, and global rate caps. Review manually after triggers. Have a global “Stop all outgoing calls” button.
Incoming Call Security Caller IDs
can be spoofed. Do not gate important stuff on them. “Thanks for calling our automated system. Put in your task code to continue.” Task code: 4~6 digit random ID. Expires in 1 hour. If possible, flush codes if > 3 failures in a row. Per-account call-in numbers when feasible. Increases security and cuts down on support costs.
Why Rate Limit Then? Control
costs to your business and customer. Protect customer from crushing their offline processes which are feeding to/from the phones. “Great that it scales. By the way, can we get an off button? To turn off calls for a few hours?” “Why do you need an off button?” “Our operators sometimes get called away from their desks, for meetings and whatnot.” “Certainly. How many operators do you have?” “Two.”
Random Grabbag Of Advice Never
contact Twilio in request/response cycle. Queue requests, use worker process. Fiverr.com for voice actresses. Find one you like, put her on retainer. Record copious information about errors. Very hard to get individualized “What did your customer do to hear that unspecified „Something broke‟ message?” Fail closed: default to not making the call.