Developing software at scale cs 394 may 2011

Todd Warren – CS 394 Spring 2011 Developing Software at Scale: Lessons from 20+ Years at Microsoft

Today Team structure at Microsoft Product Complexity and Scheduling Quality and software testing Knowing when to ship

Programs vs. Software products 3x 3x 9x Source: Fred Brooks Jr., Mythical Man Month, “The Tar Pit”

Software Products vs. Custom Software Development Source: Hoch, Roeding, Purkert, Lindner, “Secrets of Software Success”, 1999

Roles Product Manager User-Interface designer End-user liaison Project Manager Architect Developers Tool Smith QA/Testers Build Coordinator Risk Officer End User Documentation Program Management Software Development Engineers Test and Quality Assurance User Assistance / Education Source: McConnell

Resources Size Matters! Different Methodologies and Approaches Scope of Feature and Quality Matters Affects Level of Process needed and overhead 5 person teams: Moderate Process, Shared Roles 24 person teams (PMC): Moderate Process, Lifecycle oriented roles and specialization—good for “Extreme” style process 60-100 (MS Project): Moderate Process, some loose functional specialization and lifecycle 100-200 (Windows CE) person teams: Medium to Heavy Process, Lifecycle roles and functional specialization 1000+ Person Teams (Windows Mobile): Heavy Process, Multiple Methodologies, Formal Integration Process Higher Quality==more rigorous process True also for open source, online projects Apache is best example of very specified culture of contribution

Organization and its affect on Products VeryFormal Casual More Formal

Project Interdependency Matters:“Star” or “Mesh” Office Windows Edge Edge Edge Edge Edge Edge Core Edge Core Edge Edge Edge Edge Edge

A ‘Typical’ Product Group 25% Developers 45% Testers 10% Program Management 10% User Education / Localization 7% Marketing 3% Overhead

Small Product: Portable Media Center 1 UI Designer 5 Program managers 8 Developers 10 testers

Microsoft Project 30 Developers (27%) 36 Testers (33%) 15 Program Mgrs (14%) 20 UA/Localization (18%) 6 Marketing (5%) 3 Overhead (3%)

Exchange Numbers (circa 2000) 112 Developers (25.9%) 247 Testers (57.3%) 44 Program Mgrs. (10.2%) 12 Marketing (2.7%) 16 Overhead (3.7%)

Amount of Time 3 month maximum is a good rule of thumb for a stage/milestone. Hard for people to focus on anything longer than 3 months. Never let things go un-built for longer than a week

Smaple Staged Timeline (Project 2000)

How Long? 216 days development (truthfully probably more like 260d) 284 days on “testing” in example Component Tests: 188d System wide tests:~97d 50/50 split between design/implement and test/fix Some Projects (e.g. operating systems, servers) longer integration period (more like 2:1) Factors: How distributed, number of “moving parts” Show why some of the Extreme methodology is appealing.

Fred Brooks OS/360 Rules of thumb 1/3 planning 1/6 coding 1/4 component test and early system test 1/4 system test, all components in hand

A few projects compared to Brooks

Quality and Testing Design in Scenarios up front What is necessary for the component UI is different than API Server is different than client Set Criteria and usage scenarios Understanding (and controlling if possible) the environment in which the software is developed and used “The last bug is found when the last customer dies” -Brian Valentine, SVP eCommerce, Amazon

Example of ComplexityTopology Coverage Exchange versions: 4.0 (latest SP), 5.0 (latest SP) and 5.5 Windows NT version: 3.51, 4.0 (latest SP’s) Langs. (Exchange and USA/USA, JPN/Chinese, JPN/Taiwan, JPN/Korean,Windows NT): JPN/JPN, GER/GER, FRN/FRN Platforms: Intel, Alpha, (MIPS, PPC 4.0 only) Connectors X.400: Over TCP, TP4, TP0/X.25 Connectors IMS: Over LAN, RAS, ISDN Connectors RAS: Over NetBEUI, IPX, TCP Connector interop: MS Mail, MAC Mail, cc:Mail, Notes News: NNTP in/out Admin: Daily operations Store: Public >16GB and Private Store >16GB Replication: 29 sites, 130 servers, 200,000 users, 10 AB views Client protocols: MAPI, LDAP, POP3, IMAP4, NNTP, HTTP Telecommunication: Slow Link Simulator, Noise Simulation Fault tolerance:Windows NT Clustering Security: Exchange KMS server, MS Certificate Server Proxy firewall: Server-to-Server and Client-to-Server

Complexity 2: Windows CE 5m lines of code 4 processor architectures ARM/Xscale, MIPS, x86, SH 20 Board Support Packages Over 1000 possible operating system components 1000’s of peripherals

Complexity 3: Windows Mobile 6.x 2 code instances (“standard” and “pro”) 4 ARM Chip Variants 3 memory configuration variations 8 Screen sizes (QVGA, VGA, WVGA, Square..) 60 major interacting software components 3 network technologies (CDMA, GSM, WiFi) Some distinct features for 7 major vendors 100 dependent 3rd party apps for a complete “phone”

Flow of tests during the cycle Unit TestsImplemented FeatureImplemented Feature isSpecified Test Design Is written Test ReleaseDocument ComponentTesting SpecializedTesting SystemTest” Bug Fix RegressionTests

Ways of Testing Types of Tests Black Box White Box “Gray” Box Stage of Cycle Unit Test / Verification Test Component Acceptance Test System Test Performance Test Stress Test External Testing (Alpha/Beta/”Dogfood”) Regression Testing

Four Rules of Testing Guard the Process Catch Bugs Early Test with the Customer in Mind M0 M1 M2 RTM Make it Measurable Ship Requirement ProOnGo LLC – May 2009

Inside the Mind of a Tester How close are we to satisfying agreed upon metrics/criteria? Are the criteria passing stably, every time we test? What are we building, and why? What do our bug trends say about our progress? How risky is this last-minute code check-in? What metrics and criteria summarize customer demands? Based on current trends, when will we pass all criteria? Do we pass all criteria? If not: what, why, how? Can we reliably measure these metrics and criteria? RTM Milestone Confirm, Ship M0 Specs & Test Plans M1 .. Mn Development & Test ProOnGo LLC – May 2009

What are we building, and why? Any problems with this? What metrics and criteria summarize customer demands? Can we reliably measure these metrics and criteria? M0: Specs & Test Plans // An API that draws a line from x to y VOID LineTo(INT x, INT y); RTM Milestone Confirm, Ship M0 Specs & Test Plans M1 .. Mn Development & Test ProOnGo LLC – May 2009

Balance: Fast vs. Thorough Canary Most Frequent Shallow Coverage Least Frequent Completes Coverage Build Verification Tests Automated Test Pass Manual Test Pass RTM Milestone Confirm, Ship M0 Specs & Test Plans M1 .. Mn Development & Test ProOnGo LLC – May 2009

Fast Tests that can automatically run at check-in time Static Code Analysis (like lint) Trial build, before check-in committed to SCM Form-field tests: Check-In cites a bug number? Code-reviewer field filled out? Canary & Check-In Tests RTM Milestone Confirm, Ship M0 Specs & Test Plans M1 .. Mn Development & Test ProOnGo LLC – May 2009

Build Verification Test Goal: find bugs so heinous that they could… Block ability to dogfood Derail a substantial portion of test pass (5%?) Unwritten Contract: You break the build, you fix it within an hour Day or night Holds up productivity of entire team RTM Milestone Confirm, Ship M0 Specs & Test Plans M1 .. Mn Development & Test ProOnGo LLC – May 2009

Automated Test Pass Example on a Microsoft product: Number of test cases: 6 digits Number of test runs: 7 digits 14 different target device flavors Runs 24/7, results available via web Automatic handling of device resets / failsafe Requires creativity: How would you automate an image editor? A 3D graphics engine? RTM Milestone Confirm, Ship M0 Specs & Test Plans M1 .. Mn Development & Test ProOnGo LLC – May 2009

Manual Test Pass Cost of automating vs. Cost of running manually Rationale/Quantitative way to decide whether to automate Reality: few organizations maximize benefits of automation Therefore, manual testing lives on Tough “Automated vs. Manual” decisions: Testing for audio glitches (does the audio crackle?) Does the UI feel responsive enough? RTM Milestone Confirm, Ship M0 Specs & Test Plans M1 .. Mn Development & Test ProOnGo LLC – May 2009

Tracking Bugs Who found When What, and it’s seveirty How to reproduce What part of the product Create a Where fixed and by whom State Open, Resolved, Closed Disposition Fixed, Not Fixed, Postponed, “By Design”

Release Criteria What must be true for a release to be done or complete Includes a mix of criteria All features implemented and reviewed Documentation complete All Bugs Closed (not necessarily fixed) Performance Criteria met Video

Bug “Triage” Late in the cycle, a process for determining what to fix Getting people together and prioritizing impact on release criteria and overall stability goals Even Known Crashing bugs are postponed depending on criteria

Summary With software products, know what to build for the customer Have checkpoints for progress (Milestones) Many types of testing and structure: right tool for the job Determine and measure ship criteria

Developing software at scale cs 394 may 2011

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Developing software at scale cs 394 may 2011

Similar to Developing software at scale cs 394 may 2011 (20)

Recently uploaded

Recently uploaded (20)

Developing software at scale cs 394 may 2011

Editor's Notes