SlideShare a Scribd company logo
1 of 36
Optimising digital content delivery




          Tamas Jambor
     University College London
      EPSRC Industrial CASE
Structure of the talk

•   Problem description
•   Features of the data
•   Baseline algorithms
•   Modified algorithms for content delivery
    – Time-aware models
• Evaluating efficient content delivery
• Future work
Background




• Video traffic increasing over the internet
Increased video traffic

• Peak-time traffic slows connection speed
• Delivering videos beforehand
  –   Cheaper to deliver
  –   Reduce peak time traffic
  –   User can watch content instantly (slow connection)
  –   HD content can be delivered (slow connection)
Features of the data

• Film Data (views and previews)
   – 1 July 2009 – 31 January 2010
   – 2.3 million entries, 64 000 users, 1300 assets
• Removing inconsistencies
   – Unknown entries
   – Assets end earlier than assets start
• After filtering
   – 1.9 million entries, 64 000 users, 1267 items
Training and test sets

• Requirements
  – Any user has to have at least one preview or view in the
    training and one view in the test
  – No previews in the test
• Training
  – 1 July 2009 – 31 December 2009
  – 1.2 million entries, 26 000 users, 1267 items
• Test
  – 1 January 2010 – 31 January 2010
  – 72000 entries, 26 000 users, 1267 items
Unique features of the dataset

• Implicit feedback carries less information
  – Feedback is expressed before an opinion could be
    formed
     • User might not like the item
  – Implicit feedback recommender systems make
    assumptions on missing rating scores
     • User is not interested
     • User does not know the item
Unique features of the dataset

• Preview information
   – Weak indication of interest
0.16
0.14
0.12
 0.1
                                      Purchased after one
0.08                                  day
0.06                                  Purchased within one
                                      day
0.04
0.02
  0
          Per Item         Per User
Baseline algorithm

• Implicit SVD
                                    T    2                    2         2
  min            wu ,i (ru ,i   q du )
                                    i            (       qi           du )
   q ,d
          u ,i                                       i            u

• Fix item or user
                          T     u            1       T   u
          du          (Y C Y             I ) Y C r (u )
Baseline algorithm

• Advantage of this approach
  – Task can be divided to independent chunks (user/item)
  – Scalable solution
  – It can be computed in a parallel fashion
• Weights
  – Addition information / assumption about data
Weights

• Weight can be assigned for each user-item pair
  – Previews
     wu ,i       P (t | p, u ) (1       ) P (t | p, i )
     • Item that are previewed before are more likely to be watched
  – Confidence decay in time
                  t tr
    wu ,i    e
Popular items

                                                       Frequency   Avr(days)   SD(days)   Available (days)
I Now Pronounce You Chuck & Larry (PictureBox)         4469        8.30        8.29       28.00
Curious George: A Very Monkey Christmas (PictureBox)   3753        8.73        7.21       31.00
Kingdom                                                3709        8.96        8.05       28.00
Santa Claus (PictureBox)                               3654        3.37        2.72       18.00
Munster's Scary Little Christmas (PictureBox)          3654        8.38        8.09       28.00
Inside Man (PictureBox)                                3530        9.31        8.35       28.00
Step Up (PictureBox)                                   3326        9.05        8.40       28.00
Wiz                                                    3291        14.29       12.04      41.46
Smokin' Aces (PictureBox)                              3253        7.68        7.64       28.00
Break-Up                                               3203        9.32        7.84       27.96
Jarhead (PictureBox)                                   3041        8.84        7.90       28.00
Stealing Christmas (PictureBox)                        3026        3.69        3.03       18.00
Hangover                                               3006        11.10       6.88       26.56
Viewing habits
                       Patch Adams   Elizabeth - The Golden Age

                  70


                  60


                  50
Number of views




                  40


                  30


                  20


                  10


                  0




                                      Date
Viewing habits


• Viewing behaviour
  – During the day
     • Differentiate who is watching
  – During the week
     • Weekends/weekdays
  – Categories
     • Some content are likely to be watched at specific times
Viewing habits
                            1         t
• Gaussian CDF   (t , , )     1 erf
                            2             2   2
Prediction

• For known items
      rc ,t    rb     (td ,     c ,d     ,    c ,d   )    (tw ,   c ,w   ,   c ,w   )
  – Baseline prediction
  – Daily Gaussian distribution for category
  – Weekly Gaussian distribution for category
• For new items
    rc ,t     rc    (t d ,    c ,d   ,       c ,d   )    (t w ,   c,w   ,    c,w   )
  – Prediction for the category
  – Daily Gaussian distribution for category
  – Weekly Gaussian distribution for category
Evaluation method
                        hu
• Top-N Hit rate lu
                        vu
  – h = num. assets watched ∩ (top-N) recommended
  – v = sum the assets watched
• Overall performance        1   M
                         l             li
                             M   i 1
  – Average performance across all users (M)
Results: Top-15 Performance
                                           Top-15 Hit Rate       Number of users
0.25                                                                                                    9000


                                                                                                        8000

 0.2
                                                                                                        7000


                                                                                                        6000
0.15
                                                                                                        5000


                                                                                                        4000
 0.1
                                                                                                        3000


                                                                                                        2000
0.05

                                                                                                        1000


  0                                                                                                     0
       500--Above   200--500   100--200   50--100       20--50       10--20        5--10   1--5   All
Efficient caching
                                        WCC

                           Content
                           Provider                  STB

• Pre-cache items that are predicted to be relevant
  –   Cheaper to deliver
  –   Reduce peak time traffic
  –   User can watch content instantly (slow connection)
  –   HD content can be delivered (slow connection)
Predictive caching

     CONTENT



                                        MODELS
                                                                CACHE LIST
1.   Assets
2.   Size
3.   Schedule (window start/end)
4.   Category                                                   •Cost per customer
                                   1.   Personalised Top-N      •Overall cost
     CUSTOMERS
                                   2.   Popular items
                                   3.   Marketing suggestions




1.   View History (time)
Cost function


  call    cbe * nbe    caf * naf
• Cost of delivering best effort (BE)
• Cost of delivering in real time (AF)
Assumptions of the model

• Two (or more) different pricing for different
  delivery methods
• Fixed line speed
• Simplified markets
• Ignore network infrastructure
Preliminary Evaluation

• Hit rate
   – Not sensitive to sparsity
   – Good to measure performance
• Precision
   – Sensitive to sparsity and relevant items
Results: Hit rate
            0.3



           0.25



            0.2
Hit rate




           0.15



            0.1



           0.05



             0
                  1   6   11   16    21            26           31   36   41   46
                                    Number of retrieved items
Results: Average precision
                    0.0018


                    0.0016


                    0.0014


                    0.0012
Average precision




                     0.001


                    0.0008


                    0.0006


                    0.0004


                    0.0002


                        0
                             1   6   11   16    21            26           31   36   41   46
                                               Number of retrieved items
Sparse data
                                                           Average views

                                0.3



                               0.25
Average views (2010 January)




                                0.2



                               0.15



                                0.1



                               0.05



                                 0
                                      0   25   50   75   100            125   150   175   200   225
                                                               Profile size
Sparse data – how many items to upload

• Non-personalised
  – Variation between upload once a day to upload once in
    a month
• Personalised
  – How many items the use watched recently
Predictive cashing

• Error I:
   – Predict the number of items the user will watch
      • Control the maximum number of items cached
• Error II:
   – Prediction accuracy
      • Only predict for less risky users
Maximum number of items cached


             caf vu
    nu ,be
               cbe
• Example
  – User will watch 5 items in the coming month (predicted)
  – Deliver real time(AF): £0.70
  – Deliver before(BE): £0.30
             0.70 * 5
    nu ,be            11.66
              0.30
Performance


         hu ,be
    lu
         nu ,be
  – Hits on cached items
  – Numbersize of items cached
• Overall performance
                              N
                             i    1
                                    hi ,be
                       l     M
                              j 1
                                    n j ,be
Performance of the system

      cbe
  l
      caf

• To save on cost compare
  – The performance of the system
  – Ratio between the two delivery methods
Example

  – Performance
     • 3 hits on 5 delivered items, 2 items streamed
              hu ,be     3
      lu                     0 .6
               nbe       5
     • Deliver real time(AF): £0.70
     • Deliver before(BE): £0.30
              cbe      0.3
       l                     0.42
              caf      0.7
  – Cost
       call         cbe * nbe caf * naf   2 * 0.7 5 * 0.3 2.9
     • (expected to be less than streaming only)
Evaluation II

• Upload ratio      nbe     caf
                     v      cbe
     • Number of items cached
     • Example (caf=£0.7,cbe=£0.3): for every watched item we can
       cache maximum 2.3 items
• Upload hits       hbe     cbe
                    nbe     caf
     • Performance of the model
     • Example (caf=£0.7,cbe=£0.3): for ever cached item we need at
       least 0.42 hits
• If both satisfied cost saving is guaranteed
Results – Combining personalised and non-
personalised recommenders
               0.02

              0.018

              0.016

              0.014

              0.012
Upload hits




               0.01

              0.008

              0.006

              0.004

              0.002

                 0
                      0   0.05   0.1   0.15   0.2   0.25   0.3   0.35   0.4    0.45    0.5   0.55       0.6   0.65   0.7   0.75   0.8   0.85   0.9   0.95   1
                                                                              Personalised vs popular
Unique characteristics of the system

• Recommender algorithm
  – Low risk approach
  – No prediction if it is not likely to get it right
• Caching strategy
  – Only for users who will use the system
  – Predict the number of items to be uploaded
Future work

•   Test the system on other datasets
•   Redefine baseline algorithm
•   Availability might influence choice
•   Adaptive temporal approach
    – Controlling the update of the system
       • How much data is flowing in
       • How much performance loss the system expects

More Related Content

Similar to Optimising digital content delivery

Ppt compressed sensing a tutorial
Ppt compressed sensing a tutorialPpt compressed sensing a tutorial
Ppt compressed sensing a tutorial
Terence Gao
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Domonkos Tikk
 
Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at Netflix
DataWorks Summit
 
Social Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and RetrievalSocial Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and Retrieval
justinvw
 

Similar to Optimising digital content delivery (20)

Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
BCS SIGiST - How Fast is the Cloud?
BCS SIGiST - How Fast is the Cloud?BCS SIGiST - How Fast is the Cloud?
BCS SIGiST - How Fast is the Cloud?
 
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
 
Ppt compressed sensing a tutorial
Ppt compressed sensing a tutorialPpt compressed sensing a tutorial
Ppt compressed sensing a tutorial
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
NumXL 1.55 LYNX release notes
NumXL 1.55 LYNX release notesNumXL 1.55 LYNX release notes
NumXL 1.55 LYNX release notes
 
Group2 presentation2
Group2 presentation2Group2 presentation2
Group2 presentation2
 
Improving Image Tag Recommendation Using Favorite Image Context
Improving Image Tag Recommendation Using Favorite Image ContextImproving Image Tag Recommendation Using Favorite Image Context
Improving Image Tag Recommendation Using Favorite Image Context
 
Image ORB feature
Image ORB featureImage ORB feature
Image ORB feature
 
Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at Netflix
 
IHC 2011 - Widgets Internship
IHC 2011 - Widgets InternshipIHC 2011 - Widgets Internship
IHC 2011 - Widgets Internship
 
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
 
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
 
Social Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and RetrievalSocial Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and Retrieval
 
Model Compression
Model CompressionModel Compression
Model Compression
 
9.20 o13.2 k hogan
9.20 o13.2 k hogan9.20 o13.2 k hogan
9.20 o13.2 k hogan
 
Evaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated DatabasesEvaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated Databases
 
Processing images with Deep Learning
Processing images with Deep LearningProcessing images with Deep Learning
Processing images with Deep Learning
 
CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Optimising digital content delivery

  • 1. Optimising digital content delivery Tamas Jambor University College London EPSRC Industrial CASE
  • 2. Structure of the talk • Problem description • Features of the data • Baseline algorithms • Modified algorithms for content delivery – Time-aware models • Evaluating efficient content delivery • Future work
  • 3. Background • Video traffic increasing over the internet
  • 4. Increased video traffic • Peak-time traffic slows connection speed • Delivering videos beforehand – Cheaper to deliver – Reduce peak time traffic – User can watch content instantly (slow connection) – HD content can be delivered (slow connection)
  • 5. Features of the data • Film Data (views and previews) – 1 July 2009 – 31 January 2010 – 2.3 million entries, 64 000 users, 1300 assets • Removing inconsistencies – Unknown entries – Assets end earlier than assets start • After filtering – 1.9 million entries, 64 000 users, 1267 items
  • 6. Training and test sets • Requirements – Any user has to have at least one preview or view in the training and one view in the test – No previews in the test • Training – 1 July 2009 – 31 December 2009 – 1.2 million entries, 26 000 users, 1267 items • Test – 1 January 2010 – 31 January 2010 – 72000 entries, 26 000 users, 1267 items
  • 7. Unique features of the dataset • Implicit feedback carries less information – Feedback is expressed before an opinion could be formed • User might not like the item – Implicit feedback recommender systems make assumptions on missing rating scores • User is not interested • User does not know the item
  • 8. Unique features of the dataset • Preview information – Weak indication of interest 0.16 0.14 0.12 0.1 Purchased after one 0.08 day 0.06 Purchased within one day 0.04 0.02 0 Per Item Per User
  • 9. Baseline algorithm • Implicit SVD T 2 2 2 min wu ,i (ru ,i q du ) i ( qi du ) q ,d u ,i i u • Fix item or user T u 1 T u du (Y C Y I ) Y C r (u )
  • 10. Baseline algorithm • Advantage of this approach – Task can be divided to independent chunks (user/item) – Scalable solution – It can be computed in a parallel fashion • Weights – Addition information / assumption about data
  • 11. Weights • Weight can be assigned for each user-item pair – Previews wu ,i P (t | p, u ) (1 ) P (t | p, i ) • Item that are previewed before are more likely to be watched – Confidence decay in time t tr wu ,i e
  • 12. Popular items Frequency Avr(days) SD(days) Available (days) I Now Pronounce You Chuck & Larry (PictureBox) 4469 8.30 8.29 28.00 Curious George: A Very Monkey Christmas (PictureBox) 3753 8.73 7.21 31.00 Kingdom 3709 8.96 8.05 28.00 Santa Claus (PictureBox) 3654 3.37 2.72 18.00 Munster's Scary Little Christmas (PictureBox) 3654 8.38 8.09 28.00 Inside Man (PictureBox) 3530 9.31 8.35 28.00 Step Up (PictureBox) 3326 9.05 8.40 28.00 Wiz 3291 14.29 12.04 41.46 Smokin' Aces (PictureBox) 3253 7.68 7.64 28.00 Break-Up 3203 9.32 7.84 27.96 Jarhead (PictureBox) 3041 8.84 7.90 28.00 Stealing Christmas (PictureBox) 3026 3.69 3.03 18.00 Hangover 3006 11.10 6.88 26.56
  • 13. Viewing habits Patch Adams Elizabeth - The Golden Age 70 60 50 Number of views 40 30 20 10 0 Date
  • 14. Viewing habits • Viewing behaviour – During the day • Differentiate who is watching – During the week • Weekends/weekdays – Categories • Some content are likely to be watched at specific times
  • 15. Viewing habits 1 t • Gaussian CDF (t , , ) 1 erf 2 2 2
  • 16. Prediction • For known items rc ,t rb (td , c ,d , c ,d ) (tw , c ,w , c ,w ) – Baseline prediction – Daily Gaussian distribution for category – Weekly Gaussian distribution for category • For new items rc ,t rc (t d , c ,d , c ,d ) (t w , c,w , c,w ) – Prediction for the category – Daily Gaussian distribution for category – Weekly Gaussian distribution for category
  • 17. Evaluation method hu • Top-N Hit rate lu vu – h = num. assets watched ∩ (top-N) recommended – v = sum the assets watched • Overall performance 1 M l li M i 1 – Average performance across all users (M)
  • 18. Results: Top-15 Performance Top-15 Hit Rate Number of users 0.25 9000 8000 0.2 7000 6000 0.15 5000 4000 0.1 3000 2000 0.05 1000 0 0 500--Above 200--500 100--200 50--100 20--50 10--20 5--10 1--5 All
  • 19. Efficient caching WCC Content Provider STB • Pre-cache items that are predicted to be relevant – Cheaper to deliver – Reduce peak time traffic – User can watch content instantly (slow connection) – HD content can be delivered (slow connection)
  • 20. Predictive caching CONTENT MODELS CACHE LIST 1. Assets 2. Size 3. Schedule (window start/end) 4. Category •Cost per customer 1. Personalised Top-N •Overall cost CUSTOMERS 2. Popular items 3. Marketing suggestions 1. View History (time)
  • 21. Cost function call cbe * nbe caf * naf • Cost of delivering best effort (BE) • Cost of delivering in real time (AF)
  • 22. Assumptions of the model • Two (or more) different pricing for different delivery methods • Fixed line speed • Simplified markets • Ignore network infrastructure
  • 23. Preliminary Evaluation • Hit rate – Not sensitive to sparsity – Good to measure performance • Precision – Sensitive to sparsity and relevant items
  • 24. Results: Hit rate 0.3 0.25 0.2 Hit rate 0.15 0.1 0.05 0 1 6 11 16 21 26 31 36 41 46 Number of retrieved items
  • 25. Results: Average precision 0.0018 0.0016 0.0014 0.0012 Average precision 0.001 0.0008 0.0006 0.0004 0.0002 0 1 6 11 16 21 26 31 36 41 46 Number of retrieved items
  • 26. Sparse data Average views 0.3 0.25 Average views (2010 January) 0.2 0.15 0.1 0.05 0 0 25 50 75 100 125 150 175 200 225 Profile size
  • 27. Sparse data – how many items to upload • Non-personalised – Variation between upload once a day to upload once in a month • Personalised – How many items the use watched recently
  • 28. Predictive cashing • Error I: – Predict the number of items the user will watch • Control the maximum number of items cached • Error II: – Prediction accuracy • Only predict for less risky users
  • 29. Maximum number of items cached caf vu nu ,be cbe • Example – User will watch 5 items in the coming month (predicted) – Deliver real time(AF): £0.70 – Deliver before(BE): £0.30 0.70 * 5 nu ,be 11.66 0.30
  • 30. Performance hu ,be lu nu ,be – Hits on cached items – Numbersize of items cached • Overall performance N i 1 hi ,be l M j 1 n j ,be
  • 31. Performance of the system cbe l caf • To save on cost compare – The performance of the system – Ratio between the two delivery methods
  • 32. Example – Performance • 3 hits on 5 delivered items, 2 items streamed hu ,be 3 lu 0 .6 nbe 5 • Deliver real time(AF): £0.70 • Deliver before(BE): £0.30 cbe 0.3 l 0.42 caf 0.7 – Cost call cbe * nbe caf * naf 2 * 0.7 5 * 0.3 2.9 • (expected to be less than streaming only)
  • 33. Evaluation II • Upload ratio nbe caf v cbe • Number of items cached • Example (caf=£0.7,cbe=£0.3): for every watched item we can cache maximum 2.3 items • Upload hits hbe cbe nbe caf • Performance of the model • Example (caf=£0.7,cbe=£0.3): for ever cached item we need at least 0.42 hits • If both satisfied cost saving is guaranteed
  • 34. Results – Combining personalised and non- personalised recommenders 0.02 0.018 0.016 0.014 0.012 Upload hits 0.01 0.008 0.006 0.004 0.002 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Personalised vs popular
  • 35. Unique characteristics of the system • Recommender algorithm – Low risk approach – No prediction if it is not likely to get it right • Caching strategy – Only for users who will use the system – Predict the number of items to be uploaded
  • 36. Future work • Test the system on other datasets • Redefine baseline algorithm • Availability might influence choice • Adaptive temporal approach – Controlling the update of the system • How much data is flowing in • How much performance loss the system expects