Tag Archives: Data Dictionary

Managing and using custom classifications

Classifications in Alfresco allow for associations of content elements with specific categories from a hierarchial structure. The standard product provides a generic out-of-the-box tree of categories relating to languages, regions and types of software documentation classifications. Lucene queries may be formulated that select or aggregate content based on the associated categories and even sub-trees of categories, e.g. selecting all documents associated with an English language independent of the actual dialect, i.e. American, British or any other English variant.

The Alfresco wiki has a pretty good documentation of classifications and categories. Unfortunately, the documentation does not reflect how classifications are used in a  apparent majority of projects. Instead of defining custom classifications similar to the example provided in the wiki, I have seen numerous instances where the out-of-the-box hierarchy was simply extended. This is understandable considering the amount of functionality provided for the out-of-the-box hierarchy and the effort saved by reusing it. But this approach means that categories do not serve in their intended function of providing semantically separate classifications – any arbitrary category may be assigned to the cm:categories property instead of chossing from business-oriented, separate value sets with the odd connection between individual categories.

My colleagues and I have all participated in several projects that use custom classifications to organize content and – in some instances – provide virtual navigation structures based on content classification. Apart from the technical architecture, Alfresco does not provide much in the way of supporting using custom classifications. The category manager included in the Web Client only handles the out-of-the-box hierachy as does its Share counterpart introduced in Alfresco 4.0 (by Jan Pfitzner).  In order to save us – and other developers of the community – the trouble of having to reinvent the wheel any more than necessary, I recently set out to enhance Jans component and submit it as a contribution to Alfresco.

In short I have modified the following four aspects based on Alfresco 4.0c:

  • Added the ability to manage multiple classifications
  • Added the ability to create / modify categories that use a business-specific subtype
  • Patched the Forms API to allow creating new content objects using a child-association other than cm:contains
  • Patched the object-finder form component to support usage of business-specific category subtypes

Category Manager - Multiple Classifications

The Share category manager only allows for managing the out-of-the-box hierarchy of cm:generalclassifiable – as does its Web Client predecessor. In order to simplify usage of custom classifications, they need to be managable without requiring additional develeopment effort. Minimal adaptions made to the tree construction code and the introduction of a new data web script on the repository tier now allow any classification to be administered. In order to hide certain technical classifications that are used and managed differently (e.g. cm:taggable and cm:classifiable) I have introduced a configuration option to ignore these classification aspects.

Category Manager - Form-based Management

Categories are standard nodes – much like almost anything else in Alfresco – that are defined by the type cm:category. This type may suffice for the most usages, but sometimes there is the requirement of associating business metadata with categories. The data dictionary and modelling of Alfresco allows that subtypes of cm:category may be defined or aspects used to enhance categories with the additional data. The Share category manager only supported simple categories with a name as the sole property. I have based the component on forms to provide the necessary flexiblity in managing category types and metadata. New categories will always be created using a form dialog, while existing categories may be edited using the insitu-editor or a form dialog based on another configuration option I have introduced. The former is the default for simple categories of type cm:category.

The form-based management of categories required extensions of the Forms API. In order to create root categories in the correct location it was necessary to provide a form filter that resolves the virtual node reference alfresco://category/root to the correct reference for the specific classification aspect. The creation of sub-categories via forms in addition required the ability of specifying the correct child-association to use (cm:subcategories) instead of the default cm:contains – a feature marked with a TODO in the code but not implemented since at least Alfresco 3.2.

Assigning values from a custom classification

Using a custom classification for editing the metadata of a content item only worked if the categories used where of the type cm:category. Navigating into a sub-level of the hierarchy was not possible for any subtype otherwise. Supporting subtypes required an adaption to the object finder providing the selection dialog within forms and the supporting data web script on the repository tier. Type specific checks were replaced by proper type hierarchy evaluations. A small Spring Surf Extension may be used to associate any business category type with either the generic or a custom icon.

cm:name – An enforced property

in the last post I described a performance problem which could be traced back to the usage of cm:name (cm:cmobject as parent type) in modelling / instantiating 500.000+ record sets in the default content store. Using one of the listed concepts to work around this issue, I have been setting up a small migration aiming to remove the redundant property cm:name by switching to the parent type sys:base. I have since come to realize that cm:name isindependently from my model type definition in the data dictionaryenforced on all public interfaces and always indexed. Only the integrity checks for mandatory and constrained properties respect the actual type definition.

This of course negates the purpose of my entire approach of combatting our performance problems. If it is impossible to have a node in the database which is not indexed with a cm:name property, side effects on the performance of sorting navigation scripts for Share using that same property are unavoidable..

How does this behaviour manifest itself?

  • When a node is created without a cm:name value, no value for that property is persisted to the database. During reads on the nodes properties, the UUID of the NodeRef is transparently returned as a fake cm:name value (see e.g. DBNodeServiceImpl.getProperty(NodeRef, QName) or ReferenceablePropertiesEntity.addReferenceableProperties(Node, Map<QName,Serializable>)).
  • Only the properties defined in the type definition are validated during node creation / modification. Since cm:name is only defined for cm:cmobject and its subtypes, it is only validated for these types. Any evaluation of the mandatory constraint is suppressed as the property is being faked to have the value of the UUID if not set explicitly.
  • During indexing the type definition of the node being indexed is not respected as far as properties are concerned. All properties present on the node are indexed according to their property definition, irregardless of wether they should even be present on the node or not. This means that even if nodes do not inherit from cm:cmobject, a cm:name value is being indexed because a) the property is transparently set to the UUID if not present and b) a property definition for cm:name exists which specifies that it must be indexed.

This behaviour has essentially remained unchanged since 3.2 based on my investigations into the Alfresco SVN and remains in place in the current 4.0 trunk. I was unable to identify which Alfresco feature might require this enforcement of the property, overriding the configuration of my data model. Regarding the question “bug or feature” I am currently leaning towards “bug”. Since this was discovered in a project of an Enterprise customer, I have relegated this question to Alfresco Support. In case this is a consciously implemented behaviour it would be better / more appropriate to model cm:name as a property of sys:base, similar to how sys:referencable defines the other common set of properties (store protocol, identifier and UUID).