SlideShare a Scribd company logo
1 of 37
Interaction Lab. Seoul National University of Science and Technology
Neural Networks for Semantic Gaze
Analysis in XR Settings
Jeong Jae-Yeop
ETRA2021, ACM Symposium on Eye Tracking Research and Applications
Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Lena Stubbemann, Dominik Dürrschnabel, Robert Refflinghaus 2021
Interaction Lab., Seoul National University of Science and Technology
■Intro
■Approach
■Evaluation
■Conclusion and future work
Agenda
2
Intro
Approach
3
Interaction Lab., Seoul National University of Science and Technology
■Semantic gaze analysis
 The process to identify objects or features of visual and cognitive attention
• Well controlled settings
• Visual patterns and oculometric parameters
• What users are looking at
Intro(1/6)
4
Interaction Lab., Seoul National University of Science and Technology
■Semantic gaze analysis in XR settings
 ROI (Region of Interest)
• Two-dimensional depiction of an object
 VOI (Volumes of Interest)
• Three-dimensional object that emerges from this intersection
Intro(2/6)
5
Interaction Lab., Seoul National University of Science and Technology
■Annotation to VOIs data(1/2)
 VOIs data for gaze
• User-specific gaze videos with constantly changing perspectives on the target object
• Move, vanish, reappear and change shape, size or illumination …
• Time consuming process
• Manual annotations are thus still considered a standard procedure
Intro(3/6)
6
Interaction Lab., Seoul National University of Science and Technology
■Annotation to VOIs data(2/2)
 VOIs annotation problem → Image classification
• CAD (Computer Aided Design) model
• CNN (Convolutional Neural Network)
• Three-dimensional problem → Two-dimensional problem : simplified
• CNN can also recognize different perspectives on the same three-dimensional body
Intro(4/6)
7
Interaction Lab., Seoul National University of Science and Technology
■Data augmentation
 GAN (Generative Adversarial Network)
• Image augmentation technique to adapt the training data to real environmental factors
• Overcome the need for challenging photorealistic simulations
• VOI annotation not only on an object level but also on a product feature level
Intro(5/6)
8
Interaction Lab., Seoul National University of Science and Technology
■Overview
Intro(6/6)
9
Approach
Evaluation
10
Interaction Lab., Seoul National University of Science and Technology
■Address annotation problem using object recognition
 Methodological details
• Use a CAD model to prepare training data for Cycle-GAN
• Use Cycle-GAN to create reality-alike synthetic data set
• Use synthetic data set to train CNN (Convolutional Neural Network)
• Predict VOIs of experimental data with trained CNN model
Approach(1/10)
Interaction Lab., Seoul National University of Science and Technology
■Use a CAD model to prepare training data for Cycle-GAN(1/2)
 The essential resource for using object recognition algorithms is suitable database
 Feature level annotation
• CAD model or virtual prototype
Approach(2/10)
12
Interaction Lab., Seoul National University of Science and Technology
■Use a CAD model to prepare training data for Cycle-GAN(2/2)
 Training data
Approach(3/10)
13
Interaction Lab., Seoul National University of Science and Technology
■Experimental data
 Egocentric videos, which are split into frames
 Only fixation marker, not scan path
• Only one fixation marker is contained in each frame
 Gaze coordinates (𝑥, 𝑦)
Approach(4/10)
14
Interaction Lab., Seoul National University of Science and Technology
■Use Cycle-GAN to create reality-alike synthetic data set
Approach(5/10)
15
Interaction Lab., Seoul National University of Science and Technology
■GAN (Generative Adversarial Network)
Approach(6/10)
16
Interaction Lab., Seoul National University of Science and Technology
■Cycle-GAN (Cycle Generative Adversarial Network)
Approach(7/10)
17
Interaction Lab., Seoul National University of Science and Technology
■Use synthetic data set to train CNN (Convolutional Neural Network)
Approach(8/10)
18
Interaction Lab., Seoul National University of Science and Technology
■Object recognition
 Object localization combined with image classification
• Pixels to instances by means of adjacent pixels that share textures, colors, or intensities
• Feature level recognition
 Eye tracking data
• Semantic or instance segmentation can be dispensed
• Provide us with the exact coordinates of the fixation relative to the gaze replay
Approach(9/10)
19
Interaction Lab., Seoul National University of Science and Technology
■Predict VOIs of experimental data with trained CNN model
 ResNet50v2
Approach(10/10)
20
Evaluation
Conclusion and future work
21
Interaction Lab., Seoul National University of Science and Technology
■Experimental setup
 Real world and virtual-reality setting
 Fully automated coffee machine
 VOI annotation on feature level
Evaluation(1/7)
Interaction Lab., Seoul National University of Science and Technology
■Conditions/baseline
 Comparing other method
• 𝐸𝑦𝑒𝑆𝑒𝑒3𝐷 (https://eyesee3d.eyemovementresearch.com/)
 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ ∶ 𝑚𝑎𝑛𝑢𝑎𝑙 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛
 Performance metrics
• 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑛𝑑 𝑟𝑒𝑐𝑎𝑙𝑙, 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1 − 𝑠𝑐𝑜𝑟𝑒
Evaluation(2/7)
23
Interaction Lab., Seoul National University of Science and Technology
■User study design
 Participants
• 24 (6 female and 18 male)
• 3 points calibration of the eye tracking system
• Interact with the product in both the virtual and the real settings
 First phase in experiment
• Freely explore object for 60 seconds
• Free movement around the machine
 Second phase in experiment
• Subjects are asked about their perceptual impressions
• Led to certain product features as they have to solve tasks such as brewing coffee
Evaluation(3/7)
24
Interaction Lab., Seoul National University of Science and Technology
■Apparatus
 Unity3D
• Two projectors with a resolution 1920 x 1200 pixels each
 SMIs mobile glasses + SMI 3D-6D head tracking
 Outside-in motion tracking OptiTrack Prime^x 13W
 Fixation detection with BeGaze 3.7
 Desktop
• Nvidia GeForce RTX 2060 SUPER chip
• 8GB RAM
Evaluation(4/7)
25
Interaction Lab., Seoul National University of Science and Technology
■Network trainings
 Thumbnail size : 224 x 224 px
 Image augmentation using Cycle-GAN
• Simulation image : 1,000
• Virtual image : 1,000
• Real image : 1,000
• Default settings except for epoch 50
 Total training data after augmentation
• Simulation image : 100,000
• Virtual image : 100,000
• Real image : 100,000
Evaluation(5/7)
26
Interaction Lab., Seoul National University of Science and Technology
■Data preparation
Evaluation(6/7)
27
Interaction Lab., Seoul National University of Science and Technology
■Network trainings
 CNN classification
• ResNet50v2 architecture
• Output layer with 12 neurons (10 VOIs + “Coffee machine but no VOI” and “No coffee machine”)
• 224 x 224
• Adam, learning rate of 0.001 over 20 epochs with the sparse categorical cross-entropy
Evaluation(7/7)
28
Conclusion and future work
29
Interaction Lab., Seoul National University of Science and Technology
■Result
 CNN-approach performs slightly better in virtual reality than in the real world
 Human annotation
• About 30,000, 25 hours (20 images per minute)
Conclusion and future work(1/7)
Interaction Lab., Seoul National University of Science and Technology
■Discussion(1/3)
 There the fixation marker is ambiguously located between four different VOIs and default classes
• Some of which are adjacent and others which are simultaneously hidden due to depth effects
Conclusion and future work(2/7)
31
Interaction Lab., Seoul National University of Science and Technology
■Discussion(2/3)
 Some are well-recognize and some are not
• Well classified : Display
 Standard classification problem
Conclusion and future work(3/7)
32
Interaction Lab., Seoul National University of Science and Technology
■Discussion(3/3)
 Cycle-GAN can also degrade image quality
• Use gaze coordinates, not fixation marker
Conclusion and future work(4/7)
33
Interaction Lab., Seoul National University of Science and Technology
■Limitation
 The study gave a proof of concept for two different domains
• Only coffee machine
Conclusion and future work(5/7)
34
Interaction Lab., Seoul National University of Science and Technology
■Conclusion
 Propose a method for semantic gaze analysis using machine learning, while eliminating the
resource-intense process of human annotations
 Neither markers nor motion tracking systems are required
 Not contain a personal bias and is thus not prone to evaluator effects
 The same methodical evaluation can be used across platforms
Conclusion and future work(6/7)
35
Interaction Lab., Seoul National University of Science and Technology
■Future work
 Our work is to be seen as a proof of concept.
• Potential future work to further increase the accuracy of predictions
 Chances for improving our approach
• Advanced image classification methods or further improving the image augmentations techniques
Conclusion and future work(7/7)
36
Q&A
37

More Related Content

Similar to Neural networks for semantic gaze analysis in xr settings

Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...
Nothing!
 
Implementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep LearningImplementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep Learning
Md. Mahfujur Rahman
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final Presentation
John Vinti
 

Similar to Neural networks for semantic gaze analysis in xr settings (20)

Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimation
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for vision
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformer
 
Progress Reprot.pptx
Progress Reprot.pptxProgress Reprot.pptx
Progress Reprot.pptx
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...
 
Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...
 
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
 
Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...
 
Targeting accurate object extraction from an image a comprehensive study of ...
Targeting accurate object extraction from an image  a comprehensive study of ...Targeting accurate object extraction from an image  a comprehensive study of ...
Targeting accurate object extraction from an image a comprehensive study of ...
 
Resume_updated_job
Resume_updated_jobResume_updated_job
Resume_updated_job
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
 
Implementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep LearningImplementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep Learning
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Word
WordWord
Word
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final Presentation
 
ISM2014
ISM2014ISM2014
ISM2014
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...
 

More from Jaey Jeong (8)

핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model training
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnn
 
deep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsdeep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skills
 
deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagation
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learing
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Neural networks for semantic gaze analysis in xr settings

  • 1. Interaction Lab. Seoul National University of Science and Technology Neural Networks for Semantic Gaze Analysis in XR Settings Jeong Jae-Yeop ETRA2021, ACM Symposium on Eye Tracking Research and Applications Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG) Lena Stubbemann, Dominik Dürrschnabel, Robert Refflinghaus 2021
  • 2. Interaction Lab., Seoul National University of Science and Technology ■Intro ■Approach ■Evaluation ■Conclusion and future work Agenda 2
  • 4. Interaction Lab., Seoul National University of Science and Technology ■Semantic gaze analysis  The process to identify objects or features of visual and cognitive attention • Well controlled settings • Visual patterns and oculometric parameters • What users are looking at Intro(1/6) 4
  • 5. Interaction Lab., Seoul National University of Science and Technology ■Semantic gaze analysis in XR settings  ROI (Region of Interest) • Two-dimensional depiction of an object  VOI (Volumes of Interest) • Three-dimensional object that emerges from this intersection Intro(2/6) 5
  • 6. Interaction Lab., Seoul National University of Science and Technology ■Annotation to VOIs data(1/2)  VOIs data for gaze • User-specific gaze videos with constantly changing perspectives on the target object • Move, vanish, reappear and change shape, size or illumination … • Time consuming process • Manual annotations are thus still considered a standard procedure Intro(3/6) 6
  • 7. Interaction Lab., Seoul National University of Science and Technology ■Annotation to VOIs data(2/2)  VOIs annotation problem → Image classification • CAD (Computer Aided Design) model • CNN (Convolutional Neural Network) • Three-dimensional problem → Two-dimensional problem : simplified • CNN can also recognize different perspectives on the same three-dimensional body Intro(4/6) 7
  • 8. Interaction Lab., Seoul National University of Science and Technology ■Data augmentation  GAN (Generative Adversarial Network) • Image augmentation technique to adapt the training data to real environmental factors • Overcome the need for challenging photorealistic simulations • VOI annotation not only on an object level but also on a product feature level Intro(5/6) 8
  • 9. Interaction Lab., Seoul National University of Science and Technology ■Overview Intro(6/6) 9
  • 11. Interaction Lab., Seoul National University of Science and Technology ■Address annotation problem using object recognition  Methodological details • Use a CAD model to prepare training data for Cycle-GAN • Use Cycle-GAN to create reality-alike synthetic data set • Use synthetic data set to train CNN (Convolutional Neural Network) • Predict VOIs of experimental data with trained CNN model Approach(1/10)
  • 12. Interaction Lab., Seoul National University of Science and Technology ■Use a CAD model to prepare training data for Cycle-GAN(1/2)  The essential resource for using object recognition algorithms is suitable database  Feature level annotation • CAD model or virtual prototype Approach(2/10) 12
  • 13. Interaction Lab., Seoul National University of Science and Technology ■Use a CAD model to prepare training data for Cycle-GAN(2/2)  Training data Approach(3/10) 13
  • 14. Interaction Lab., Seoul National University of Science and Technology ■Experimental data  Egocentric videos, which are split into frames  Only fixation marker, not scan path • Only one fixation marker is contained in each frame  Gaze coordinates (𝑥, 𝑦) Approach(4/10) 14
  • 15. Interaction Lab., Seoul National University of Science and Technology ■Use Cycle-GAN to create reality-alike synthetic data set Approach(5/10) 15
  • 16. Interaction Lab., Seoul National University of Science and Technology ■GAN (Generative Adversarial Network) Approach(6/10) 16
  • 17. Interaction Lab., Seoul National University of Science and Technology ■Cycle-GAN (Cycle Generative Adversarial Network) Approach(7/10) 17
  • 18. Interaction Lab., Seoul National University of Science and Technology ■Use synthetic data set to train CNN (Convolutional Neural Network) Approach(8/10) 18
  • 19. Interaction Lab., Seoul National University of Science and Technology ■Object recognition  Object localization combined with image classification • Pixels to instances by means of adjacent pixels that share textures, colors, or intensities • Feature level recognition  Eye tracking data • Semantic or instance segmentation can be dispensed • Provide us with the exact coordinates of the fixation relative to the gaze replay Approach(9/10) 19
  • 20. Interaction Lab., Seoul National University of Science and Technology ■Predict VOIs of experimental data with trained CNN model  ResNet50v2 Approach(10/10) 20
  • 22. Interaction Lab., Seoul National University of Science and Technology ■Experimental setup  Real world and virtual-reality setting  Fully automated coffee machine  VOI annotation on feature level Evaluation(1/7)
  • 23. Interaction Lab., Seoul National University of Science and Technology ■Conditions/baseline  Comparing other method • 𝐸𝑦𝑒𝑆𝑒𝑒3𝐷 (https://eyesee3d.eyemovementresearch.com/)  𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ ∶ 𝑚𝑎𝑛𝑢𝑎𝑙 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛  Performance metrics • 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑛𝑑 𝑟𝑒𝑐𝑎𝑙𝑙, 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 Evaluation(2/7) 23
  • 24. Interaction Lab., Seoul National University of Science and Technology ■User study design  Participants • 24 (6 female and 18 male) • 3 points calibration of the eye tracking system • Interact with the product in both the virtual and the real settings  First phase in experiment • Freely explore object for 60 seconds • Free movement around the machine  Second phase in experiment • Subjects are asked about their perceptual impressions • Led to certain product features as they have to solve tasks such as brewing coffee Evaluation(3/7) 24
  • 25. Interaction Lab., Seoul National University of Science and Technology ■Apparatus  Unity3D • Two projectors with a resolution 1920 x 1200 pixels each  SMIs mobile glasses + SMI 3D-6D head tracking  Outside-in motion tracking OptiTrack Prime^x 13W  Fixation detection with BeGaze 3.7  Desktop • Nvidia GeForce RTX 2060 SUPER chip • 8GB RAM Evaluation(4/7) 25
  • 26. Interaction Lab., Seoul National University of Science and Technology ■Network trainings  Thumbnail size : 224 x 224 px  Image augmentation using Cycle-GAN • Simulation image : 1,000 • Virtual image : 1,000 • Real image : 1,000 • Default settings except for epoch 50  Total training data after augmentation • Simulation image : 100,000 • Virtual image : 100,000 • Real image : 100,000 Evaluation(5/7) 26
  • 27. Interaction Lab., Seoul National University of Science and Technology ■Data preparation Evaluation(6/7) 27
  • 28. Interaction Lab., Seoul National University of Science and Technology ■Network trainings  CNN classification • ResNet50v2 architecture • Output layer with 12 neurons (10 VOIs + “Coffee machine but no VOI” and “No coffee machine”) • 224 x 224 • Adam, learning rate of 0.001 over 20 epochs with the sparse categorical cross-entropy Evaluation(7/7) 28
  • 30. Interaction Lab., Seoul National University of Science and Technology ■Result  CNN-approach performs slightly better in virtual reality than in the real world  Human annotation • About 30,000, 25 hours (20 images per minute) Conclusion and future work(1/7)
  • 31. Interaction Lab., Seoul National University of Science and Technology ■Discussion(1/3)  There the fixation marker is ambiguously located between four different VOIs and default classes • Some of which are adjacent and others which are simultaneously hidden due to depth effects Conclusion and future work(2/7) 31
  • 32. Interaction Lab., Seoul National University of Science and Technology ■Discussion(2/3)  Some are well-recognize and some are not • Well classified : Display  Standard classification problem Conclusion and future work(3/7) 32
  • 33. Interaction Lab., Seoul National University of Science and Technology ■Discussion(3/3)  Cycle-GAN can also degrade image quality • Use gaze coordinates, not fixation marker Conclusion and future work(4/7) 33
  • 34. Interaction Lab., Seoul National University of Science and Technology ■Limitation  The study gave a proof of concept for two different domains • Only coffee machine Conclusion and future work(5/7) 34
  • 35. Interaction Lab., Seoul National University of Science and Technology ■Conclusion  Propose a method for semantic gaze analysis using machine learning, while eliminating the resource-intense process of human annotations  Neither markers nor motion tracking systems are required  Not contain a personal bias and is thus not prone to evaluator effects  The same methodical evaluation can be used across platforms Conclusion and future work(6/7) 35
  • 36. Interaction Lab., Seoul National University of Science and Technology ■Future work  Our work is to be seen as a proof of concept. • Potential future work to further increase the accuracy of predictions  Chances for improving our approach • Advanced image classification methods or further improving the image augmentations techniques Conclusion and future work(7/7) 36