Time series analysis of collaborative activities-CRIWG2012
1. Time series analysis of
collaborative activities
Irene-Angelica Chounta, Nikolaos Avouris
HCI Group, University of Patras
{houren, avouris}@upatras.gr
2. Outline
• Objective
• Time series and collaborative activities
• Methodology of Analysis
• Results
• Conclusions and future work
3. Objective
• Use of time series as a tool of analysis
• Real time assessment of activity
• Classification of collaborative sessions
4. Time series and collaborative activities
• Time: important aspect of collaboration
• Analysis regarding time can
describe/reveal underlying group
dynamics
• Phenomena that may affect the quality of
collaboration can be captured in this way
(Vasileiadou, E., 2009)
5. Methodology of Analysis (1)
Memory-based learning model
Collaborative
session X
tsA
/CQA_A
DistanceX-A
IF (DistanceX-Y is minimum)
tsB
/CQA_B
DistanceX-B then {
CQA_X ≈ CQA_Y
}
…
where CQA: Collaboration
Quality Assessment
tsN
/CQA_n
DistanceX-N
6. Methodology of Analysis (2)
• a data pool of 212 collaborative sessions
(collaboration quality assessed by rating
scheme) (Kahrimanis, G., et al, 2009)
• Groupware application: shared workspace +
chat tool - Task: Dyads constructing flow
charts – Duration: 1h30’
• same conditions applied for all
clients/collaborators
7. Methodology of Analysis (3)
• time series (multivariate) of aggregated
sequences of events of collaborative activities per
time interval
– Number of Chat Messages and Workspace actions,
– Roles’ Alternations in Chat and Workspace activity
– Their differences between consecutive time intervals
• Various time intervals (1, 5, 8 and 10 minutes)
• distance measure: Dynamic Time Warping (DTW)
distance (Giorgino, T., 2009)
• two dissimilarity functions (Euclidean and
Manhattan)
8. Results (1)
Model evaluation:
• the correlation matrix of CQA(predicted vs.
true value)
• the root mean squared error (RMSE)
• the mean absolute error (MAE)
9. Results (2)
• The two variables (predicted vs. real CQA
value) are significantly and positively
correlated (p<0.05, Rho>0) for all time
intervals
Manhattan Euclidean
Time interval (min) p value Spearman’s Rho p value Spearman’s Rho
1 0.000 0.296 0.029 0.150
5 0.002 0.202 0.021 0.154
8 0.000 0.235 0.005 0.187
10 0.011 0.168 0.010 0.170
10. Results (3)
• MAE and RMSE For (CQA Є{-2, 2})
MAE RMSE
Time interval (min) Manhattan Euclidean Manhattan Euclidean
1 0.89* 0.97 1.14 1.21
5 1.19 1.21 1.48 1.5
8 1.18 1.16 1.5 1.48
10 1.17 1.19 1.44 1.47
11. Results (4)
For time interval=1 minute and Manhattan
distance:
|CQA_eval-CQA_pred| %cases
<0.5 41
<1 68.4
<2 92
CQA Є{-2, 2}
12. Conclusions & Future Work
• Significant positive correlations among the
(CQA_evaluative, CQA_prediction)
• Best results occur for 1 minute time interval
and Manhattan distance
(Rho:0.3,MAE: 0.89,RMSE: 1.1, CQA Є{-2, 2})
• Advanced classification techniques (k-nearest
neighbor) are expected to improve the results
• Further explore real time assessment and the
way feedback affects collaboration’s unfolding
14. Euclidean vs. Manhattan
• Best distance highly dependable on data’s
nature
• Euclidean distance is not good with high
dimensional data
Euclidean: Manhattan:
15. Dynamic Time Warping
• Popular technique for comparing time series
• The series are "warped" non-linearly in the
time dimension in order to find best match
• Provides distance measure than can be further
used for classification
• Applies to both univariate and multivariate
time series
16. Rating Scheme
• provides quantitative judgments of the quality
of collaboration
• proposes the rating of seven collaborative
dimensions on a 5 point scale
• Collaboration Quality Average (CQA) is defined
as the average value of six dimensions
(Collaboration Flow, Sustaining Mutual Understanding,
Knowledge Exchange, Argumentation, Structuring
Problem Solving Process, Cooperative Orientation)
17. Time series
• Time series:
any sequence of observations recorded at
successive time intervals
(univariate, multivariate)
• Examples of use:
– Network traffic monitored by a web server per hour
– Shares’ price in a stock market per week
– Genes activity on biological processes
18. RMSE, MAE
• MAE: all the individual differences are
weighted equally in the average.
• RMSE: the RMSE gives a relatively high weight
to large errors.
• The MAE and the RMSE can be used together
to diagnose the variation in the errors in a set
of forecasts.
19. Model evaluation
Best MAE=0.89 where:
– previous post assessment, machine learning
techniques scored a MAE=0.74
– and MAE < 1 is acceptable for similar applications
(Kahrimanis, 2010)
– Simplicity of the model
– Real time results
20. Differences?????
Chat messages: a1 a2 a3 … aN-1 aN
Differences of Chat messages: a2-a1 a3-a2 … aN-aN-1
Editor's Notes
to empower existing machine learning techniques and minimize the workload of human evaluators regarding time series characteristics vs. qualitative assessmentsfor providing further feedback to collaborative partners
Time is a fundamental aspect of collaboration and further analysis regarding time can reveal the underlying group dynamics
a data pool of 212 collaborative sessions associated with quantitative assessments of collaboration qualityTime series constructed by the aggregated events of Number of Chat Messages and Workspace actions, Roles Alternations in chat and workspace activitydistance measure : Dynamic Time Warping (DTW) distance (Giorgino, T., 2009)
for most of the time interval/dissimilarity method combinations Best results considering correlation coefficient, MAE/RMSE occur for 1 minute time interval and Manhattan distance (0.3, 0.89, 1.1 respectively, for a value range {-2, 2}).For the classification one optimal match was used for each query time series. Results could be improved if we used more advanced techniques (k-nearest neighbor)
measuring similarity between two sequences which may vary in time or speedDTW is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension
that stand for the five, fundamental aspects of collaboration: communication, joint information processing, coordination, interpersonal relationship and motivationCollaboration Quality Average (CQA) is defined as the average value of six out of seven, dimensions (leaving out the motivational/Individual task orientation aspect)
It measures accuracy for continuous variables. Expressed in words, the MAE is the average over the verification sample of the absolute values of the differences between forecast and the corresponding observation. The MAE is a linear score which means that all the individual differences are weighted equally in the average.Root mean squared error (RMSE)The RMSE is a quadratic scoring rule which measures the average magnitude of the error. The equation for the RMSE is given in both of the references. Expressing the formula in words, the difference between forecast and corresponding observed values are each squared and then averaged over the sample. Finally, the square root of the average is taken. Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE is most useful when large errors are particularly undesirable.The MAE and the RMSE can be used together to diagnose the variation in the errors in a set of forecasts. The RMSE will always be larger or equal to the MAE; the greater difference between them, the greater the variance in the individual errors in the sample. If the RMSE=MAE, then all the errors are of the same magnitude