Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
TOWARDS ADVANCED DATA RETRIEVAL
FROM LEARNING OBJECTS REPOSITORIES
Valentina Paunovic
Belgrade Metropolitan University
Slo...
What problem do we solve?
Popularity of personalized distance based learning
Demands

Effective creation of learning mater...
Textual search
Learning Object

Search

Type

Text
Image
Video
...

Type

Meta
data

TEXTUAL

Effective textual search in ...
Our system - contributions
• Search engine
– Steiner-trees approach
– Algorithm for graph representation of LOR.

• Query ...
Steiner trees search
Traditional search:
(for example - text processing applications)
alternative

Steiner trees
Steiner trees approach
• Query
– word1, word2, word3

• Possible interpretation
– Find all objects such that each object c...
Example – possible alternatives
Ranking
• Smaller number of LO:
– Stronger relationships among terms from query
– Conclusion: advantage in rankings
– Exam...
Main advantages
• Situation: there is no object which satisfies all
terms from query
– Traditional search – no results
– S...
Vector space model from text mining
• How to determine which LO are related?

• LO is represented as an m-dimensional TF-I...
Vector space model II
• Weights :
– The highest impact (weight) have terms from metadata
title, keywords and description.
...
LO similarity measure
• Now we can introduce similarity measure
• One possibility - Cosine similarity
sim(d1, d 2)

r (d1)...
Search algorithm
• Issue: finding top k minimum cost Steiner
trees (MCST-k) is NP complete
• DBPF-k developed for keyword ...
Graph representation of LOR
• Steiner-trees search requires sparse graph
• Graph representation of LOR:
– Nodes: LO
– Weig...
Graph sparsification - rules
•
•
•
•

No node should be removed from the graph.
Low similarity edges should be removed fro...
Sparsification
• Complexity of the
algorithm is:

O(| E | log | E |)
O((number of

2

LO) )
Query language
• Example query: exponential function
• Issue 1: What if there is a term exp instead of
exponential?
– Poss...
Query language - extension
1. Operator and, marked by reserved word %AND.
2. Operator or, marked by reserved word %OR.
• B...
Query language
• How to evaluate complex expression like
(a %OR b) %AND ((c %OR d) %AND e)
• We can not submit such query ...
,
,

Query language - terminology

.

• Term (t) – word used in a query
• Simple Query (Q) – set of terms:
Q {t1 , t 2 ,.....
Parsing algorithm
initialize S as empty stack of expressions;
initialize empty set of search results R;
foreach token w of...
Architecture of search system
Conclusion
• Proposed architectural solution for advanced
search through repositories of learning objects
• Search based o...
Towards advanced data retrieval from learning objects repositories
Upcoming SlideShare
Loading in …5
×

Towards advanced data retrieval from learning objects repositories

Related Books

Free with a 30 day trial from Scribd

See all
  • Login to see the comments

  • Be the first to like this

Towards advanced data retrieval from learning objects repositories

  1. 1. TOWARDS ADVANCED DATA RETRIEVAL FROM LEARNING OBJECTS REPOSITORIES Valentina Paunovic Belgrade Metropolitan University Slobodan Jovanovic Belgrade Metropolitan University This work was supported by Ministry of Education, Science and Technology (Project III44006).
  2. 2. What problem do we solve? Popularity of personalized distance based learning Demands Effective creation of learning materials Enables REUSABILITY SEARCH Enables
  3. 3. Textual search Learning Object Search Type Text Image Video ... Type Meta data TEXTUAL Effective textual search in large LOR is important
  4. 4. Our system - contributions • Search engine – Steiner-trees approach – Algorithm for graph representation of LOR. • Query language – Extension based on formal logic. – Algorithm for parsing extended language.
  5. 5. Steiner trees search Traditional search: (for example - text processing applications) alternative Steiner trees
  6. 6. Steiner trees approach • Query – word1, word2, word3 • Possible interpretation – Find all objects such that each object contains all words from query – Issue: what if there is no such object? • Alternative interpretation – Find all groups of related objects such that each group contains all words form query
  7. 7. Example – possible alternatives
  8. 8. Ranking • Smaller number of LO: – Stronger relationships among terms from query – Conclusion: advantage in rankings – Example: the best solutions consist of only one LO • Group which contains more similar LO (from same area or subject) – Stronger relationships among terms from query – Conclusion: advantage in rankings – Example: the best solution are groups of LO from the same area
  9. 9. Main advantages • Situation: there is no object which satisfies all terms from query – Traditional search – no results – Steiner trees search – returns results • Possible to detect implicit relationships among learning objects
  10. 10. Vector space model from text mining • How to determine which LO are related? • LO is represented as an m-dimensional TF-IDF vector: r (d ) (tfidf1 , tfidf 2 ,..., tfidf m ) • Each component is calculated as tfidf • tf * idf Term frequency: tfi h j n(i, j ) j – n(i,j) - number of occurrences of i-th term in the j-th slot of LO d – hj - weight associated with the j-th slot.
  11. 11. Vector space model II • Weights : – The highest impact (weight) have terms from metadata title, keywords and description. – Medium impact have terms from content (if there is textual content). – Low impact have terms from the rest of searchable metadata • Inverse document frequency has purpose to reduce impact of common words | LOR | idfi log | {d LOR : wi d } |
  12. 12. LO similarity measure • Now we can introduce similarity measure • One possibility - Cosine similarity sim(d1, d 2) r (d1) r (d 2) || r (d1) || * || r (d1) ||
  13. 13. Search algorithm • Issue: finding top k minimum cost Steiner trees (MCST-k) is NP complete • DBPF-k developed for keyword search on DB: – Has polynomial solution – First returned result is optimal – The rest of (k-1) solutions are approximate • Efficiency of DBPF-k algorithm depends on graph sparseness.
  14. 14. Graph representation of LOR • Steiner-trees search requires sparse graph • Graph representation of LOR: – Nodes: LO – Weighted edges: defined by similarity measure between any two nodes • Issue: dense graph - number of edges: 2 O(( number of LO ) ) • Result: Slow search
  15. 15. Graph sparsification - rules • • • • No node should be removed from the graph. Low similarity edges should be removed from the graph. Edge removal should not violate graph connectivity. Targeted number of edges is specified by parameter T. Graph obtained by sparsification process should have less than T edges, unless it violates connectivity constraint. • No priority among edges of equal weight • If two learning objects are in relationship specified by the metadata relation, it should be preserved in the graph regardless of similarity degree between these two learning objects.
  16. 16. Sparsification • Complexity of the algorithm is: O(| E | log | E |) O((number of 2 LO) )
  17. 17. Query language • Example query: exponential function • Issue 1: What if there is a term exp instead of exponential? – Possible solution: dictionary of synonyms + dictionary of acronyms and abbreviations – Problem: Can be complicated to implement • Issue 2: Find all exponential or logarithmic functions – Possible solution: submit two different queries – Problem: Can be inconvenient for a user
  18. 18. Query language - extension 1. Operator and, marked by reserved word %AND. 2. Operator or, marked by reserved word %OR. • Both operators have the same precedence priority. • Expressions are evaluated from left to right. • If there is no operator between two terms, implicitly is assumed %AND operation. For example, “math function” is evaluated as “math %AND function”. • Associativity rule is preserved from formal logic
  19. 19. Query language • How to evaluate complex expression like (a %OR b) %AND ((c %OR d) %AND e) • We can not submit such query directly to search algorithm • We need a query parsing algorithm
  20. 20. , , Query language - terminology . • Term (t) – word used in a query • Simple Query (Q) – set of terms: Q {t1 , t 2 ,..., t|Q| } • Expression (E) – set of simple queries: E {Q1 , Q2 ,..., Q|E| } • Operation corresponds to operator %AND: E1 E2 {Qi  Q j | Qi E1 , Q j • Operation corresponds to operator %OR: E1 E2 E1  E 2 E2 }
  21. 21. Parsing algorithm initialize S as empty stack of expressions; initialize empty set of search results R; foreach token w of query switch(w): case “(”,“%AND”,“%OR”: push w to S; case “)”: E<-evaluateTopExpression(S); push E to S; default: if(previous token is term) push “%AND” to S; Q = {w}; E = {Q}; push E to S; end switch; E<-evaluateTopExpression(S); foreach simple query Q from E result = DBPF-k(Q); add result to R; evaluateTopExpression(S) { initialize SH as empty stack; while (S not empty) wh<-pop from S; if(wh = “(”) break; push wh to SH; while (true) first<-pop from SH; if (SH is empty) return first; operator<-pop from SH; second<-pop from SH; switch(operator) case “%AND”: result = first ^ second; case “%OR”: result = first v second; end switch; push result to SH; }
  22. 22. Architecture of search system
  23. 23. Conclusion • Proposed architectural solution for advanced search through repositories of learning objects • Search based on finding top-k min-cost Steiner trees • Proposed algorithm for sparse weighted graph representation of a LO repository • Proposed extension of query language based on formal logic and designed an algorithm for parsing it

×