Hierarchy requirements

Hierarchy Requirements
Charles Severance
Sakai Project

Overall Concepts
• Sakai must handle a number of elements in a
hierarchical fashion:
– Site Hierarchy
• Sites are organized in a logical fashion
– /2004/fall/eng/ee123
• There can be sub-sites within sites
– /2004/fall/eng/ee123
– Qualifier Hierarchy
• When looking at permissions on objects, permissions can be
inherited from something higher up the hierarchy
– Departmental admin has full privileges at /2004/fall/eng - this
implies that they have full privileges at /2004/fall/eng/ee123
– Group Hierarchy
• Groups can be members of other groups

General Navigation
• One must be able to start at the top of the
hierarchy and navigate up and down. At any
level, the user can see the children and navigate to
a child, or navigate upward to parents.
• The hierarchy will not be part of the navigation,
but it will be shown in the URL as the user
navigates through sites.
• Users will be able to type in the url
– http://stellar2.mit.edu/site/2004/fall/eng/eng123/
– Will navigate the portal to that site - acts as a bookmark
to the site.

Sub Groups
• There are a number of places where tools need to know about sub-
sites. For example a course site has a bunch of small groups
associated with it. The instructor wants to make an announcement and
“push” it to some or all of the sub-sites - the instructor needs a set of
checkboxes to indicate which of the subgroups will get the pushed
content. Code is necessary to give the tool the list of sub-sites so that
the tool can display it.
• Note that this implies the list of subsites that the user can perform the
desired operation on so the complete question which must be asked is:
– Give me the list of sites below this site for which the logged in person has
the “announce.create” function.
• A generalization of this is to say - show me all of the sites that the
current logged in person has “read” permission.

Needed Methods
• getImmediateParents, getImmediateChildren
– This is used for upwards and downwards navigation. This is
simple because it only needs the adjacency matrix. These methods
exist in the Hieratcy OSID.
• getAllImportantParents
– This is what makes inheritance work - this is a performance
critical operation. (See the sub-select discussion) - the Hierarchy
OSID does not explicitly define this - because there is no sense of
the “special” parents. If this is a requirement, then API methods
will be necessary to check, set, and update the “important parent”.
• resolvePath
– Quickly lookup a path within the hierarchy - the Hierarchy OSID
does not specify this but it is a very common operation and must
perform well.

… Methods
• getAllChildrenRecursively
– This is likely needed - but it is not clear what the requirement
really is here. Perhaps this is better viewed as the following.
• getAllChildrenRecursivelyResolvingLinks
– This would follow any soft links (assuming that we do soft links) -
This concept is present in the Hierarchy OSID if we choose to
interpret it in a particular way
• addSoftParent, addHardParent
– If we choose to support it, we need to differentiate the adding of
soft and hard parents and define the sematics therein. This will
also cause getSoftParents, getHardChildren, and then recursive
and non-recursive variants.

Multiple Parents
• There is a lot of discussion of the need for multiple parents
- the common use case is for cross-listed courses. There
are important issues here though:
– Does a cross-listed course, inherit permissions equally from all
parents? So if a course is both Engineering and Science, can both
administrators have access - if not this implies that some parents
are more parental than others.
• Multi-parent can have some performance implications and
limits the SQL optimization which can be done because
code must operate in loops which can be infinite. This
causes many SQL transactions for one lookup or an
inheritance question.

Multi-Parent Link Type
– One possible approach is to take the UNIX file system approach
and treat the second and following parent links as soft links. This
raises its own set of issues.
• Soft links are only a navigation mechanism - permissions inherit
across the hard-link only - UNIX like
• What if the hard-link is severed and the node is deleted? Do the soft
links hang out pointing to nothing or are they cleaned up? (probably
cleaned up)
• Are softlinks to “iNode” or to a path? UNIX chose path.
• How do we handle loops in soft links? UNIX decided to limit the
number of soft links traversed to some number - 64 or less usually.
– Hard links can work - (iNode only)
• Is there (one or several) primary parents? Is there a priority order?
• If there is one primary link which controls inheritance and it is
removed, which of the secondary links is “promoted” and takes over
control of the inheritance? It would be nice for a human to make this
decision and not code..

Performance
• Performance is a critical requirement - often these
hierarchy lookups are a large part of the DB load
of an application.
– It is not practical to cache the structure - There will be
Sakai installations with 200K sites and each site may
have several thousand files in a hierarchy.
– It may not even be practical to have a single hierarchy -
there may need to be a federated hierarchy of
hierarchies.
– Many of these algorithms are “n-squared” on insert
because insert triggers some reorganization.

Need for SubSelect
• To truly gain ultimate performance it is important that the lookup
operations produce a simple list of hierarchy ids so that this list can be
used in a sub-select for another table (such as a join between a
resource hierarchy and a group hierarchy and a role hierarchy)
• This can be either done by retrieving large collections from all three of
these separate places and doing an n-cubed iteration through the large
collections (especially in the case with users with lots of permission
power) or do it in a single join reducing the data to that which is to be
displayed within the database.
• So if work select work is done in stored procedures - they must accept
parameters and return ID lists like a standard select.

Common Implementation
Approaches
• Straight parent - child structure
– Insert is quite fast, lookup is loop-based with an SQL transaction to
descend each level
• Transitive closure of parent child
– On insert, traverse upwards and add direct parent - child links for all
parents “above” the child.
– This ends up with NlogN data, but insert is log N and lookup is log N.
– Often done with stored procedures or looping SQL
• Specific Database Extensions
– Oracle has such extensions - does this handle multi-rooted?
• Overlapping sets
– Based on maintaining pre-order numbering of nodes
– N-Squared on insert, log N on lookup. Data is order-N
– Insert performance suffers on large hierarchies.

Implementation Questions
• How do we get lookup to operate in less than
LogN SQL transactions?
– Generally the answer is to have an auxiliary table with
the fully-realized path to each node, so that a WHERE
lookup can be done.
– This is fun when there are multiple roots because it
means that there are multiple paths to an iNode.
Understand that this is an NLogN data issue much like
the de-normalized parent relationship.

Hierarchy requirements

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Hierarchy requirements

Similar to Hierarchy requirements (20)

More from Charles Severance

More from Charles Severance (20)

Recently uploaded

Recently uploaded (20)

Hierarchy requirements