The document provides an overview of improvements made to the open-source advertising server OpenX. Key points include:
1) Many features were added such as reusable segmentation rules, zone groups, and revenue sharing capabilities.
2) Performance and scalability were significantly improved through changes such as removing database writes from the frontend, optimizing delivery scripts, and migrating to Nginx and PHP-FPM.
3) New ad formats and capabilities were supported including video, mobile ads, and automatic Flash fallback image generation.
4) The core algorithms for ad delivery, statistics processing, and forecasting were enhanced.
3. Introduction – Online Advertisement
Online advertisement coupes with delivering ads;
Placing ads in sites is a complex process:
Obtain all ads electable for a placeholder;
Exclude ads with business limitations like capping;
Assure that the ads are beying presented to the target
audience;
Assure the advertiser goals are being met;
Account the ads delivered;
Deliver the right ad format;
8. Introduction – OpenX
Open Source advertising server;
Licensed under GNU General Public License;
Project forked from phpAds developed by Tobias Ratschiller
in 1998;
Was called phpAdsNew, OpenAds and finally OpenX;
Features:
Has a web based GUI;
Extendable plugins architecture;
Serves ads throught JS and Iframes calls mainly;
9. Introduction – OpenX
Support technologies:
PHP;
MySQL;
Web Server (Apache, Nginx);
Optional memcached usage;
Filesystem to serve ad content;
10. Introduction – OpenDisplay
Starting from OpenX 1.8.5 version, SAPO OpenDisplay
project began;
A four-person team started in April 2010 to analyse and
improve OpenX capabilities to ensure entire SAPO's ad
serving network;
In August 2010, OpenDisplay started to serve a major
website, while development was undergoing;
In February 2011, SAPO began migrating it's ad serving
network in a process that took about 3 months to complete;
Today OpenDisplay serves the entire SAPO's ad network;
11. Introduction – OpenDisplay
OpenDisplay serves ads for several media:
Internet;
Mobile Internet;
Mobile Applications;
TV set-top boxes;
Connected TV's;
In the near future we'll be serving bulk campaigns for other
media;
I'll try to tell in this presentation this endeavour steps and
quirks;
12. OpenDisplay Components – Frontend
This component is responsible to serve all ad formats;
No data processing is done here due to performance besides
adserving itself;
The adserving is done using munged PHP scripts for
performance;
Plugins are included in a on demand basis;
Database queries are cached;
So it's all about ad serving decision making;
13. OpenDisplay Components – Backend
Comprises data feature processing;
Web based GUI for campaign and ad management;
Ad serving statistics;
Reporting;
Batch processing of ad delivery data for use by the
frontends;
14. OpenDisplay Components – Tasks
Maintenance Priority Engine (MPE)
Determines witch campaigns to serve given their
priorities;
Calculates ad serving probabilites given it's probabilities
and corrects them when underperforming or
overperforming;
Maintenance Statistics Engine (MSE)
Processes ad serving numbers;
Starts and stops campaigns;
15. Improvements Introduced - General
Added reusable segmentation rules;
This way a rule can be reused in several campaigns;
Added compound segmentation rules;
Segmentation rules engine was rewritten, cause the
previous segmentation system was inadequate;
Added the concept of Orders;
Sometimes a customer has several goals to different sites;
The concept of order allows to place several campaigns
with different goals in a single customer order;
16. Improvements Introduced - General
Added Zone Groups;
Instead of selecting placeholders one a at a time we can
associate several at once;
Imagine that a Run of Network (RON) campaign for all
MREC (300x250) placeholders would need to be
associated to all placeholders one by one;
Added revenue-share acounting;
For ads served on pages with third-party content;
This way, revenue can be shared with third-party content
providers;
17. Improvements Introduced - General
OpenDisplay went through a security audit by SAPO's
security team and several issues were solved;
Backoffice:
UI session cookies are now only delivered over SSL;
Session id generation function wasn't good enought and
could be easily guessed. This correction minimized
session hijacking;
New user profiles were added, and entity access was
reviewed;
Some user profiles were changed to read-only, like
advertisers and sites;
18. Improvements Introduced - General
Ads uploaded into the ad server are stored in a folder and
served upon;
At first look there is no problem with this, but over time
in some systems this can cause inode exhaustion;
So to prevent this, and speed up file retrieval we
improved upload component to distribute the files in a
two-level folder hierarchy;
OpenX can use a content farm to deliver ads, so we use this
feature from the start;
19. Improvements Introduced - General
Traffic forecast:
OpenX doesn't have a traffic forecast engine, instead it
uses an average of ads served;
We developed two alternative forecast algorithms using
Python;
This forecast is critical for a couple of reasons:
Inventory selling;
Correct impression allocation for campaigns,
specially due to targetting rules;
21. Improvements Introduced - General
Added data logging and analysis:
We started to summarize delivery properties to allow us
to calculate precise segmentation delivery probabilies;
Using these numbers in combination with traffic forecast
we can estimate the inventory for each campaign and it's
overall probability of delivery;
Also, this information is useful to commercial purposes:
Knowing the market is a very valuable information;
We are currently migrating some of this data to Hbase
that reduces data, making it usable;
22. Improvements Introduced - General
Restructured VAST 1.0 system and upgraded it to 2.0;
Video Ad Serving Template (VAST) standard from
Interactive Advertising Bureau;
Delivers video ads (pre, mid and postrolls);
Delivers overlays;
We also added a new type of ad that allows us to serve
SAPO text ads has images;
This virtual ad type works has a proxy to a different ad
system, combining two different ad systems;
Probably the first time an ad system combined them;
24. Improvements Introduced - General
Flash ads are a major problem in some systems that don't
support Flash;
iPhones and iPads for example;
To assure these ads are at all times visible we added
automatic Flash ad image generation to ads upload via
Backend;
This way, even if a Flash ad doesn't have a fallback image,
we generate one automatically;
This was accomplished using GNU's gnash in combination
with xvfb-run that provides a virtual X Window System
for gnash to run;
25. Improvements Introduced - General
Future developments will include bulk campaigns;
These campaigns differ from regular campaigns cause we
know the characteristics of the audience in advance;
Splitting audiance in sets with the same features we can
process an entire set within the LP solver at once
minimizing the number of variables;
So we can optimize the revenue using linear programming
solutions;
We will use GLPK (GNU Linear Programming Kit) has a
solver to obtain an optimal solution;
This way we can provide a solution that maximizes a
campaign's revenue;
26. Improvements Introduced - General
GLPK sample problem:
# Giapetto's problem, maximizing Giapetto's profit
var x1 >=0; /* soldier worths 3€ */
var x2 >=0; /* train worths 2€ */
/* Objective function */
maximize z: 3*x1 + 2*x2; // maximize Giapetto's profit
/* Constraints */
s.t. Finishing : 2*x1 + x2 <= 100; // only 100 hours per week
s.t. Carpentry : x1 + x2 <= 80; // only 80 hours per week
s.t. Demand : x1 <= 40; // demand of soldiers per week
End;
27. Improvements Introduced - Frontend
Database write operations were removed. Database access
now is read-only;
Delivery scripts were analysed using xdebug, and major
performance issues were tuned:
User agent regexp's used by PHPSniff were taking 25% of
the entire request time. Using memchache as user agent
cache we saved 97% of this time!
All ad serving counters are done in memcache and
persisted at every minute, soon we'll migrate this to
broker queues;
Improved ad caching system, to store and retrieve
EVERYTHING in a single operation;
28. Improvements Introduced - Frontend
Using xdebug output has an input to KCachegrind it is very
easy to analyse any PHP script: just run it!
Files generated by xdebug are read and analysed by
KCachegrind that shows for instance:
How many times a function has been called;
Total time each function used;
Where request time is use;
Making very easy to detect and improve any long running
script;
30. Improvements Introduced - Frontend
Instead of using an Apache web server we decided to use
Nginx with PHP-FPM:
Nginx scales almost linearly;
PHP-FPM behaved very fast in our tests;
PHP-FPM is a FastCGI implementation, now blunded with
PHP 5.3.3;
Instead of using PHP output compression, we used Nginx
compression, witch is faster;
Of course, we used a PHP accelerator: eAccelerator with
shared memory witch is adequate to PHP-FPM multi-
process architecture;
31. Improvements Introduced - Frontend
Even adding new features, we still were able to reduce
delivery times:
32. Improvements Introduced - Frontend
Introduced a cookie abstraction API to allow storing all
cookie and session information server-side:
OpenX by default stored session information in cookies
what was insufficient to keep an entire ad network
running due to cookie size limit (~4k);
This was a critical issue for long serving campaigns that
used capping or conversion data;
Less cookies means less bandwidth usage and faster
responses;
33. Improvements Introduced - Frontend
The new session storage mechanism added new issues;
The requests had to be sequential to allow correct session
retrieval and storage;
This required a lock mechanism to obtain session info in
an ordered fashion;
This was accomplished using memcache atomic
increments to lock session access;
All sessions are stored in memcache and the complete
process of locking, retrieving, storing and unlocking of
the session is done in a few ms (<3ms), from remote
servers!;
34. Improvements Introduced - Frontend
We can see in this chart outbound traffic dropped
significantly:
35. Improvements Introduced - Frontend
We introduced zone capping, a feature that wasn't available
in OpenX;
This feature is very useful with video ads, to avoid user
flooding with video ads;
Using zone capping we can say that a user will see one or
more ads and then will not see any more ads during a
given period of time;
This feature is managed by placeholder, independently of
the campaign settings;
36. Improvements Introduced - Frontend
Added new delivery endpoints to accomodate new formats:
Mobile:
Json
Xml
iPhonePlist
TV
VAST
Also we developed a SDK to help mobile ads integration:
Mobile ads are placed server-side, so client information
has to be passed to ad server (client IP, session id, user-
agent);
37. Improvements Introduced - Frontend
Frontend delivery algorithm was changed to support:
New segmentation rules system;
Changed election algorithm;
Zone capping;
Server-side storage of information instead of cookies;
Increased performance;
New endpoints to provide new types of ads;
No write operations into database;
Gather user properties for analysis;
38. Improvements Introduced - Frontend
Some eye opening numbers:
More than 4.000.000.000 web requests per month;
9 frontend servers using 36Mbits outbound and 25Mbits
inbound, in a total of 61Mbits throughput!
Aproximately 2,200 ad requests per second and the twice
of web requests (4,400/s);
95% of the web requests replied under 18ms;
PHP power at work... :-)
39. Improvements Introduced - Backend
Statistics component was changed to read information from
a database replica due large number of accesses;
Backoffice changed to support some filters and results
paging;
All user generated delete operations were removed, why?
Removal of a user, due to table relations could delete all
campaigns and statistics, and compromise forecast
results;
Deleting of a campaign, could loose all campaign data,
required for billing;
So all delete operations are done in maintenance tasks;
40. Improvements Introduced - Backend
We also added new targetting rules and improved others:
Geographical: country, district;
Mobile Devices Model, OS, Version;
Browser Family;
Internet Service Provider;
Organization;
Day of week;
41. Improvements Introduced - Backend
MPE was changed for a couple of reasons:
Become faster;
Decrease memory usage;
Changes in algorithm;
Optimizations;
MPE was reading ALL campaigns from database even
finished ones, so memory comsuption was increasing
linearly;
All services are now redundant;
44. Report Server
OpenX only generates csv reports;
A more reliable product required more reliable, comercial-
style reports;
This need lead us to try out JasperReports, an open-source
Java reports generator;
Thanks to iReport for Jasper, a Crystal-Reports style report
designer as a tool for creating reports, the reports can be
easily edited and tested;
46. Report Server
So, starting with JasperReports we generated a cloud style
report generation farm, how?
Combining it with SAPO Broker, a message passing system
and a flexible layered architecture;
Given this, a report request is a simple message delivered to
a SAPO Broker queue;
Every server generating reports can consume a report
request, allowing this architecture to scale almost linearly;
47. Report Server
We developed this report server in a layered style:
What report to generate;
Report parameters;
Datasource to use;
Outputs formats (HTML, XLS, Word, PDF,...);
Delivery channels (Email, FTP, SSH, …);
Report completion notification (HTTP, DB);
This layered style architecture allows us to extend any of the
layers with new features;
Will become available has open-source soon...
48. Report Server
Layer 1: what to generate
Report & parameters
Layer 2: data source
Data to use on report
Layer 3: output formats
Xls, pdf, doc, rtf...
Layer 4: delivery channels
Http, db, email
Layer 5: completion notification
Url, db
49. Problems Found
Unable to scale;
Some queries would read an entire database table if
existed long-running campaigns;
Changed this and acumulated totals in each banner what
is easier to sum;
Some internal data is still passed on using temporary
tables, but not for long...
Not fast enough, of course OpenX is good enought for small
site advertising, but not for an entire ad network;
Some entities were not working properly or were missing
due to business requirements;
50. Problems Found
But in retrospective OpenX gave us a good starting point...
Tweaking open-source code allowed us to:
From an existing open-source solution obtain a good base
to develop a better solution;
Save some costs if we had started for scratch;
Gain knowledge about advertisement concepts;
Customize new features according to specific needs;
So tweaking open-source is a great idea has a base to create
good solutions!!!