Alfresco and Arquillian #1 – “Managed Tomcat”

Note : Unfortunately, the editor / code formatter plugin alters some XML tags. When copying from the examples of this post care should be taken to correct the affected tags (i.e. dependencyManagement, groupId, artifactId, activeByDefault, systemPropertyVariables, containerProfile, arquillian.launch).

Content repositories are complex and central components of an IT infrastructure / content management solution. Consequently, the attention upon stability and quality management are considerable. Alfresco development projects primarily use JUnit for their developer and integration tests. The Alfresco repository can be instantiated in-process within a JUnit test case of a business module by using the Alfresco-provided class ApplicationContextHelper or can be run in an embedded container (e.g. Jetty as part of the Maven build). The usual focus is on covering the individual business components in an isolated environment, differing from a realistic setup by usage of custom component wiring or extensive mocking. Even if the test coverage and success rate of test cases is close to a hundred percent, these kinds of tests only offer a limited amount of certainty when it comes to the quality of a project. A significant amount of bad surprises may be lurking beneath the surface and hit when the project is tested on real integration / quality environments.

As part of our activities as an Alfresco partner, I have been spending some of my spare time outside of projects and sales support on getting development and integration tests working on Arquillian – a test framework based on JUnit and aiming to support real tests on real environments. I will try to document the various steps, problems, workarounds and solutions I have performed, encountered and come up with in a series of specific blog posts.

Problems with an “Embedded Tomcat” setup

In a lot of development projects, an embedded setup is the simplest and most efficient configuration when it comes to both the turnaround in getting new developers set up and reducing the dependency on local environments. My colleagues and I have tried to setup an embedded Tomcat with Arquillian, but ultimately failed to resolve the various classloading issues that came along with executing Tomcat in the same process. Specifically the XML APIs that are provided both by Tomcat and the JDK, and are referenced by Alfresco, created various incompatibilities between the Arquillian / Maven / boot classloader and the web application classloader.

Preparation of Tomcat instance / global Alfresco configuration

Running a “managed Tomcat” setup requires the preparation of a local Tomcat instance, which Arquillian will use to deploy the Alfresco Repository and execute tests on. Depending on the Alfresco version to be tested, this will either be a Tomcat 6 or Tomcat 7 installation. It is advisable to use a dedicated Tomcat instance for Arquillian instead of the instance you may have installed via the Alfresco installer.

For Arquillian to deploy Alfresco after the start of Tomcat it is necessary to provide the “manager” web application and configure the tomcat-users.xml for necessary access privileges. All other web applications being bundled with Tomcat (ROOT / host-manager) may be safely removed. A simple configuration of tomcat-users.xml might look like this:

< ?xml version='1.0' encoding='utf-8'?>
<tomcat -users>
  <role rolename="manager"/>
  <user username="arquillian" password="arquillian" roles="manager"/>
</tomcat>

Providing a working alfresco-global.properties as basic configuration along with the necessary JDBC drivers in <tomcat>/shared/classes helps to keep test cases free from environment-specific configuration as best as possible. Setting up the connection to the database, directy for content storage and other supporting components is of primary concern, but subsystems may also be (pre-)configured to optimize the test environment. It is likely that 80% of test cases will be served by the following default configuration:

  • Subsystem “fileServers”: deactivation of CIFS, NFS and FTP (not testable via JUnit)
  • Subsystem “email”: deactivation of the IMAP / SMTP servers (not testable via JUnit)

If the subsystem configuration is done the proper way, i.e. placing configuration files into <tomcat>/shared/classes/alfresco/extension/subsystems/<Subsystem>/…, individual test cases may override the default configuration by using the Java classloader lookup mechanism. Under no circumstances should subsystem configuratio be placed in alfresco-global.properties (always considered “bad practice”).

Project setup

Includiung Arquillian in a specific development project can require quite specific adaptions. A simple Maven Java-project will server as the base of further demonstration / code snippets. Integrating the example witht the Maven SDK is an exercise left to the reader.
A simple project can be created by running mvn archetype:generate -DgroupId={project-packaging} -DartifactId={project-name} -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false or using the equivalent wizard in an IDE. Arquillian is included in the Maven lifecycle by adding the following artifact repository and dependencies to the project POM:

    <repositories>
        <repository>
            <id>jboss-public-repository</id>
            <name>JBoss Public Repository</name>
            <url>https://repository.jboss.org/nexus/content/groups/public</url>
        </repository>
    </repositories>
 
    <dependencymanagement>
        <dependencies>
            <dependency>
                <groupid>org.jboss.arquillian</groupid>
                <artifactid>arquillian-bom</artifactid>
                <version>1.0.4.Final</version>
                <scope>import</scope>
                <type>pom</type>
            </dependency>
        </dependencies>
    </dependencymanagement>
    <dependencies>
        <dependency>
            <groupid>junit</groupid>
            <artifactid>junit</artifactid>
            <version>4.8.1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupid>org.jboss.arquillian.junit</groupid>
            <artifactid>arquillian-junit-container</artifactid>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupid>org.jboss.arquillian.extension</groupid>
            <artifactid>arquillian-service-integration-spring-inject</artifactid>
            <version>1.0.0.Beta1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupid>org.jboss.arquillian.extension</groupid>
            <artifactid>arquillian-service-deployer-spring-3</artifactid>
            <version>1.0.0.Beta1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupid>org.jboss.shrinkwrap.resolver</groupid>
            <artifactid>shrinkwrap-resolver-impl-maven</artifactid>
            <scope>test</scope>
        </dependency>
    </dependencies>

These dependencies provide the basic framework for the dynamic creation of Alfresco deployments via the ShrinkWrap API as well as access to Spring beans of the Alfresco for embedded JUnit tests. If the intended use of Arquillian is limited to running remote tests, the Spring-related dependencies may be dropped.

Connecting Arquillian with a Tomcat instance requires the setup of a container adapter, which is usually done in a specific, selectable profile:

    <profiles>
        <profile>
            <id>tomcat-managed</id>
            <activation>
                <activebydefault>true</activebydefault>
            </activation>
            <dependencymanagement>
                <dependencies>
                    <!-- Lock the version, since additional dependencies (i.e. from Alfresco) often clash -->
                    <dependency>
                        <groupid>commons-codec</groupid>
                        <artifactid>commons-codec</artifactid>
                        <version>1.5</version>
                    </dependency>
                </dependencies>
            </dependencymanagement>
            <dependencies>
                <dependency>
                    <groupid>org.jboss.arquillian.container</groupid>
                    <artifactid>arquillian-tomcat-managed-6</artifactid>
                    <version>1.0.0.CR4</version>
                    <scope>test</scope>
                </dependency>
            </dependencies>
        </profile>
    </profiles>

Running tests of Alfresco 4.2 in a Tomcat 7 requires the corresponding dependency arquillian-tomcat-managed-7.

The second part of connecting Arquillian with the Tomcat instance is the configuration of the container in arquillian.xml on the classpath of the project. This file can be placed in src/test/resources to provide a global configuration or within a profile-specific path to allow for configuration individual for each developer. A profile-specific location of the file is usually not necessary since since the individual configuration segments within the file can be selected via the Surefire plugin configuration. A basic example of arquillian.xml might look like this:

< ?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns="http://jboss.org/schema/arquillian" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://jboss.org/schema/arquillian http://jboss.org/schema/arquillian/arquillian_1_0.xsd" xmlns:spring="urn:arq:org.jboss.arquillian.container.spring.embedded_3">
    <container qualifier="tomcat-managed-6" default="true">
        <configuration>
            <!-- Must match HTTP port from Tomcat server configuration file -->
            <property name="bindHttpPort">8680</property>
            <property name="bindAddress">localhost</property>
            <!-- The prepared Tomcat instance -->
            <property name="catalinaHome">D:/Applications/Arquillian/tomcat-repo</property>
            <property name="javaHome">C:/Program Files/Java/jdk1.6.0_30</property>
            <!-- Allow generous Heap and PermGen since we may deploy + start Alfresco multiple times -->
            <property name="javaVmArguments">-Xmx2G -Xms2G -XX:MaxPermSize=1G -Dnet.sf.ehcache.skipUpdateCheck=true -Dorg.terracotta.quartz.skipUpdateCheck=true</property>
            <!-- Must match configured manager from tomcat-users.xml -->
            <property name="user">arquillian</property>
            <property name="pass">arquillian</property>
            <property name="urlCharset">UTF-8</property>
            <property name="startupTimeoutInSeconds">120</property>
            <!-- Local copy of Tomcat server configuration file -->
            <property name="serverConfig">server.xml</property>
        </configuration>
    </container>
 
    <extension qualifier="spring">
        <!-- Deactive automatic inclusion of Spring artifacts in deployments as Alfresco already contains them -->
        <property name="auto-package">false</property>
    </extension>
</arquillian>

The relevant server.xml can be copied to the project classpath from the Tomcat instance configured for Arquillian (src/test/resources). Unfortunately it does not seem possible to simply refer to it within the Tomcat instance by using a file path.

The value of the qualifier attribute of the container element can be used to select the container in a specific profile of the project POM:

<build>
    <plugins>
        <plugin>
            <groupid>org.apache.maven.plugins</groupid>
            <artifactid>maven-surefire-plugin</artifactid>
            <configuration>
                <systempropertyvariables>
                    <arquillian .launch>${containerProfile}</arquillian .launch>
                </systempropertyvariables>
            </configuration>
        </plugin>
    </plugins>
</build>
<profiles>
    <profile>
        <id>Dev XY</id>
        <properties>
            <containerprofile>xy-tomcat-managed-6</containerprofile>
        </properties>
    </profile>
</profiles>

If the JUnit-integration of the IDE is used, Arquillian will also look up the arquillian.xml in the classpath, but the container profiles need to be selected with the -Darquillian.launch system property.

A simple REST API test

The simplest test cases that can be run with Arquillian without much further configuration / adaption are tests of the Alfresco REST API. The following test case runs with the configuration of all previous snippets (only adding a dependency on org.jboss.resteasy:resteasy-jaxrs and org.json:json):

@RunWith(Arquillian.class)
public class SimpleLoginRemoteTest {
    @ArquillianResource // HTTP base-URl specific for our test deployment
    private URL baseURL;
 
    @Deployment // build the Repository WAR we want to test
    public static WebArchive createDeployment() throws Exception {
        // initialize Maven resolver from our project POM (specifically: repository-configuration to retrieve artifacts)
        final MavenDependencyResolver resolver = DependencyResolvers.use(MavenDependencyResolver.class).loadMetadataFromPom("pom.xml");
 
        // we want a standard Alfresco WAR for our tests - no modifications
        final File[] files = resolver.artifact("org.alfresco.enterprise:alfresco:war:4.1.4").exclusion("*:*").resolveAsFiles();
 
        // there is a simpler "createFromZipFile" method, but we want to provide a custom webapp-name to avoid deployment conflicts
        // files[0] is the resolved alfresco.war
        final WebArchive webArchive = ShrinkWrap.create(WebArchive.class, "SimpleLoginRemoteTest.war").as(ZipImporter.class).importFrom(files[0]).as(WebArchive.class);
        return webArchive;
    }
 
    @RunAsClient @Test
    public void testAdminRESTLogin() throws Exception {
        // use JBoss resteasy-library + org.json (add to POM) to perform a login
        final ClientRequest loginRequest = new ClientRequest(this.baseURL.toURI() + "s/api/login");
        loginRequest.accept("application/json");
 
        final JSONObject loginReqObj = new JSONObject();
        loginReqObj.put("username", "admin").put("password", "admin");
 
        loginRequest.body("application/json", loginReqObj.toString());
 
        final ClientResponse< String> loginResponse = loginRequest.post(String.class);
        Assert.assertEquals("Login failed", 200, loginResponse.getStatus());
    }
}

Some elaboration on the example:

  • Arquillian specific test case classes need to be executed with the corresponding Arquillian runner, which controls the lifecycle a bit differently than the JUnit standard.
  • Each test case comes with a static method with a @Deployment annotation, which builds the artifact to test using ShrinkWrap. This allows for a specific combination of the components under scrutiny and the inclusion of custom configuration.
  • Test methods with the @RunAsClient annotation are run by Arquillian within the JUnit process / context and don’t have access to beans of the Alfresco Repository. If this annotation is missing, Arquillian will execute the test method within the Alfresco Repository web application via the ArquillianServletRunner servlet automatically added to the WebArchive.
  • Test methods with @RunAsClient that need to acces the Alfresco Repository remotely can use an URL instance field annotated with @ArquillianResource to get the HTTP base URL for the context of the web application.
  • When the MavenDependencyResolver is used to build the deployment based on the project POM, the necessary artifact repositories for Alfresco need to be added to the POM(i.e. http://artifacts.alfresco.com/…).

When running the tests from an IDE or via mvn test Arquillian will start the Tomcat instance, build the test artifacts via the callback methods and transfer / deploy them in Tomcat via the manager web application. As long as the Tomcat instance has been properly set up a green bar or the BUILD SUCCESS message should be the result.

A simple (service ) bean test

In order to have Arquillian tests cover (service) beans, a test class needs to have direct access to all beans defined in the Spring application context of the Alfresco Repository. In contrast to the regular Alfresco JUnit tests (e.g. NodeServiceTest) Arquillian tests cannot make use of the ApplicationContextHelper as this would start a new application context within the already running web application. Since the various test methods (with @RunAsClient and without) are mixed in one class which will be executed in both the JUnit process as well as the managed Alfresco Repository, it is also not possible to simply use @Before/@BeforeClass methods an retrieve the Spring application context manually using ContextLoader.getCurrentWebApplicationContext(). A simple customization during the creation of the deployment enable the use of @Autowired and @Qualifier annotations to have Arquillian inject the necessary beans automatically via the ArquillianServletRunner.

@RunWith(Arquillian.class) @SpringWebConfiguration
public class SimpleNodeServiceLocalTest {
    @Deployment
    public static WebArchive createDeployment() {
        final MavenDependencyResolver resolver = DependencyResolvers.use(MavenDependencyResolver.class).loadMetadataFromPom("pom.xml");
 
        final File[] files = resolver.artifact("org.alfresco.enterprise:alfresco:war:4.1.4").exclusion("*:*").resolveAsFiles();
        final WebArchive webArchive = ShrinkWrap.create(WebArchive.class, "SimpleNodeServiceLocalTest.war").as(ZipImporter.class).importFrom(files[0]).as(WebArchive.class);
        webArchive.addAsResource("arquillian-alfresco-context.xml", "alfresco/extension/arquillian-alfresco-context.xml");
        return webArchive;
    }
 
    @Autowired @Qualifier("NodeService")
    protected NodeService nodeService;
 
    @Autowired @Qualifier("TransactionService")
    protected TransactionService transactionService;
 
    @Test
    public void testNodeService() throws Exception {
        Assert.assertNotNull("NodeService not injected", this.nodeService);
        Assert.assertNotNull("TransactionService not injected", this.transactionService);
 
        AuthenticationUtil.setFullyAuthenticatedUser("admin");
        try {
            this.transactionService.getRetryingTransactionHelper().doInTransaction(new RetryingTransactionCallback< Void>() {
                public Void execute() throws Throwable {
                    final List< StoreRef> stores = SimpleNodeServiceLocalTest.this.nodeService.getStores();
 
                    Assert.assertFalse("List of stores is empty", stores.isEmpty());
 
                    // check default / standard stores
                    Assert.assertTrue("Store workspace://SpacesStore not contained in list of stores", stores.contains(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE));
                    Assert.assertTrue("Store archive://SpacesStore not contained in list of stores", stores.contains(StoreRef.STORE_REF_ARCHIVE_SPACESSTORE));
                    return null;
                }
            }, true);
        } finally {
            AuthenticationUtil.clearCurrentSecurityContext();
        }
    }
}

The corresponding arquillian-alfresco-context.xml looks like this:

< ?xml version='1.0' encoding='UTF-8'?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="
            http://www.springframework.org/schema/beans
            http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
            http://www.springframework.org/schema/context
            http://www.springframework.org/schema/context/spring-context-3.0.xsd">
    <context:annotation -config />
</beans>

Some elaboration on the example:

  • The @SpringWebConfiguration annotation declares that beans from Spring context of the tested web application should be injected when running the test. In case servlets define specific contexts, it is possible to target these via an optional parameter.
  • Using @Autowired and @Qualifier injects specific beans via the Spring auto-wiring capability into the test class when it is executed in the container. Auto-wiring is not enabled by default in Alfresco and needs to be enabled by adding a custom XML context configuration file to the WAR. This file does not alter anything substantial in the Spring configuration of Alfresco but implicitly activates the auto-wiring capability.
  • Test methods run without any authentication or transactional context. If a test requires any of these it is the reposnibility of the test to properly initiate the context. When public service beans (e.g. “NodeService”) are tested it may be possible to leave out the initialization of the transaction context as if the test covers an atomic operation since public beans (correctly configured) automatically take care of transaction handling.
  • The project POM depends on the Alfresco Repository JAR of the corresponding Alfresco version to compile the test case.

Open issues / pain points

The provided examples allow universal testing of Alfresco in a “managed Tomcat” scenario. As is the case with all testing approaches, there are (still) some issues / problems limiting its usability / effectiveness. The following are the most critical issues / pain points from my point of view:

  • Duration of deployment – Each deployment of a test-case-specific Alfresco Repository WAR takes an enormous amount of time for the transfer, bootstrapping and shutdown of the various application components. Running the tests on my Lenovo T520 with a current SSD and 16 GiB RAM it takes about three minutes for only two test cases (complexity slightly higher than the examples). This might be bearable for individual component tests and even provide a welcome coffee break, but larger and frequent integration tests in a continuous integration context could be problematic.
  • Memory usage – Multiple deployments into Tomcat require a huge amount of memory especially for the permanent generation. Depending on the available equipment of the development environment this requirement might not be possible to be accomodated. Only single or few test cases may be executed in a single run in this instance.
  • Deployment errors – Some of the test execution runs in my environment resulted in errors during the (un)deployment of the web application which could only be resolved through manual intervention. This needs to be resolved for a continuous integration use case.
  • Minor inconsistencies across Tomcat versions – During a test of Alfresco 4.2 on Tomcat 7 Arquillian did not automatically inject the ArquillianServlertRunner servlet into the web.xml. This lead to service beans not being testable. A work-a-round for this problem is to manually include an adapted web.xml with explicit configuration of the AruillianServletRunner servlet during construction of the WebArchive.
  • Provisioning of a consistent test data set – Individual tests may require a very specific state of the content repository data (database, file content, index). Management of the data is handled externally in a “managed Tomcat” scenario outside of the test case logic and may not be influenced without environment-specific code.

Future blog posts are meant to address some of these issues and provide / document possible solutions.

Script imports with a cleaner API

Update: The patch was transferred to the Alfresco JIRA and can be tracked as ALF-13631.

In my attempts to remote debug Alfresco JavaScript using Eclipse JSDT , the way script imports are handled in Alfresco proved to be one of the key obstacles. The current approach merges scripts in a pre-processor style stage just before they are actually executed. The directive used in this mechanism in both the Repository and Share can be used as follows:

<import resource="classpath:/alfresco/templates/org/alfresco/import/alfresco-util.js">
/**
 * Main entrypoint
 */
function main()
{
   var activityFeed = getActivities();
   var activities = [], activity, item, summary, fullName, date, sites = {}, siteTitles = {};
   var dateFilter = args.dateFilter, oldestDate = getOldestDate(dateFilter);
   ...
}
main()

Prior to execution, a script is scanned for import tags starting at the very first line, collecting dependencies – even transitive ones – for the merging step. The scan process stops at the first non-whitespace character that cannot be matched to an import directive. This approach leads to some restrictions that apply to scripts:

  1. Imports are only possible in the head segment of a script before any actual processing logic.
  2. Scripts cannot be dynamically imported as the import tags resource information can not be altered by JavaScript processing logic.
  3. Syntax checks performed in IDEs rightfully complain about the syntactically invalid import directive.
  4. Rhino exceptions show a line number that does not match the source code – developers are often forced to manually calculate the line offsets of imports to find the affected line in the original source file.
  5. Breakpoints for debugging cannot be reliably set prior to the first execution and thus merging of a script. This affects my preferred choice of the Eclipse JSDT more than the embedded Rhino Debugger UI, since the former currently does not allow interaction with the merged scripts unlike the latter.

I had planned for a while to find an alternative solution to script imports – not only to allow for remote debugging using JSDT but to be able to make use of a more flexible means of using / reusing for scripts within Alfresco. This weekend I finally had the time to work on this. My goal was to provide a small extension to the JavaScript API that allows importing of scripts in arbitrary places within a script. Additionally I wanted to allow for extensibility of the lookup mechanism involved without requiring extenders to dive into the actual JavaScript API.

Alfresco itself already provides a means to add additional root objects in the JavaScript API through its javaScriptExtension beans. This feature was not sufficient for what I had in mind – the Java-based services that could be provided that way do not have the necessary level of access to the execution context of the Rhino engine. A patch of the RhinoScriptProcesor on the other hand easily allowed for adding a native JavaScript function in the global context, that – as a side effect of implementing it in the script processor – also has access to important processor internals like the script cache. The final import function can be used in JavaScript in the following manner:

importScript("legacy", "classpath:/alfresco/templates/webscripts/org/alfresco/repository/forms/pickerresults.lib.js", true);

The three parameters of the function are defined as follows:

  1. The unique identifier of lookup component to be used for the import. Using “legacy” reuses the current lookup concept that is used by the pre-processor merging and allows developers to easily adapt existing code by running a simple RegEx search & replace. Additional lookup components I’ve implemented are for dedicated “classpath” and “xpath” based resolution.
  2. A text-based reference to a script that can be resolved by the chosen lookup component.
  3. A boolean parameter that specifies wether the import should fail with a ScriptException if the script reference cannot be resolved.

The function resolves and executes the imported script in the same context and scope of the caller. The imported script has access to variables and functions defined by the importing script and can interact with those. The boolean return value of the function can be used to check for a successful resolution of the import in use-cases a failed resolution does not automatically raise an exception. As a JavaScript function it can be used at any time in the execution of a script and be used in conjunction with variables as parameters for dynamically importing arbitrary scripts.

Additional lookup components can be added by implementing a trivial Java interface and linking it with the RhinoScriptProcessor via Spring. The interface defines a single method for the resolution of the text-based script reference. A context parameter providing the resolved script location of the script performing the import is provided – if available – to allow for resolution of relative references.

public interface ScriptLocator {
 
	/**
	 * Resolves a string-based script location to a wrapper instance of the
	 * {@link ScriptLocation} interface usable by the repository's script
	 * processor. Implementations may support relative script resolution - a
	 * reference location is provided in instances an already running script
	 * attempts to import another.
	 *
	 * @param referenceLocation
	 *            a reference script location if a script currently in execution
	 *            attempts to import another, or {@code null} if either no
	 *            script is currently being executed or the script being
	 *            executed is not associated with a script location (e.g. a
	 *            simple script string)
	 * @param locationValue
	 *            the simple location to be resolved to a proper script location
	 * @return the resolved script location or {@code null} if it could not be resolved
	 *
	 */
	ScriptLocation resolveLocation(ScriptLocation referenceLocation,
			String locationValue);
}
    <bean id="javaScriptProcessor" class="org.alfresco.repo.jscript.RhinoScriptProcessor" init-method="register">
        <!--...-->
        <property name="scriptLocators">
            <map>
                <entry key="classpath">
                    <ref bean="javaScriptProcessor.classpathScriptLocator"/>
                </entry>
                <entry key="xpath">
                    <ref bean="javaScriptProcessor.xPathScriptLocator"/>
                </entry>
                <entry key="legacy">
                    <ref bean="javaScriptProcessor.legacyScriptLocator"/>
                </entry>
            </map>
        </property>
    </bean>
 
    <bean id="javaScriptProcessor.classpathScriptLocator" class="org.alfresco.repo.jscript.ClasspathScriptLocator" />
 
    <bean id="javaScriptProcessor.xPathScriptLocator" class="org.alfresco.repo.jscript.XPathScriptLocator">
        <property name="serviceRegistry" ref="ServiceRegistry"/>
    </bean>
 
    <bean id="javaScriptProcessor.legacyScriptLocator" class="org.alfresco.repo.jscript.LegacyScriptLocator">
        <property name="services" ref="ServiceRegistry"/>
        <property name="storeUrl">
            <value>${spaces.store}</value>
        </property>
        <property name="storePath">
            <value>${spaces.company_home.childname}</value>
        </property>
    </bean>

So far we have dealt with improving script imports for the Repository tier. I originally planned to use the same concept for Share / Spring Surf, but soon realized that Spring Surf / Web Scripts already provide a utility for loading of dependencies from different sources. This utility is already being used in the pre-processor stage when scripts are merged. Using a so-called “Store” allows for the resolution of abstract document paths within the classpath of an application or a remote store, such as the Alfresco Repository. This mechanism is sufficiently extensible that it would make no sense to add another.

I have provided a reduced JavaScript API within Share / Spring Surf which makes use of the existing mechanism – the “legacy” mode of the Repository can be seen as always / implicitly enforced.

importScript("classpath:/alfresco/templates/org/alfresco/import/alfresco-util.js", true);

The function is available for web scripts and template controllers, and support both explicit classpath resolution – as in the example above – as well as abstract document and relative paths. Relative resolution is only supported when the importing script is loaded from the classpath. In such a case relative resolution is attempted first, falling back to resolving an abstract path via the Store-concept (relative paths cannot in all instances be distinguished from abstract paths).

To test the new API I have applied search & replace to the my local Alfresco 4.0 Enterprise installation and replaced ALL instances of the old import directive with the new function. The migration worked without any issues so far. The Rhino Debugger UI shows all scripts formerly merged as individual scripts and breakpoints can not be set prior to the execution of any script. Line numbers in JavaScript exception are finally correct in all cases I was able to test.

Debugging Alfresco #1 – Eclipse JavaScript Debugger and Alfresco Repository

Debugging Alfresco is not always a simple undertaking. Remote Debugger features of common IDEs allow remote debugging of Java-based components, but for JavaScript and FreeMarker templates things are not as simple as they could be.

While the Rhino engine embedded in Alfresco comes with an own integrated Debugger, there is no such thing for FreeMarker as far as I know. But even the embedded Rhino Debugger is everything but feature complete. On the one hand it represents another break in the already extensive tool chain and on the other it can only be used on servers that come with a graphical user interface. Debugging of development or test environments on headless servers or VMs is not possible. I’ve recently taken the time to check out the new JavaScript Debugger features of the Eclipse JavaScript Development Tools (JSDT) project.

Starting in version 3.7 of the Eclipse IDE the essential components of the JSDT are part of every distribution which includes the Web Standard Tools sub project. After some initial problems with the not fully matured Debugger component I switched to milestone 4 of the upcoming Juno release for my tests. The project’s wiki has a rather useful guide for using the Rhino Debugger support as well as our special use case of integrating with an embedded Rhino Engine. A small FAQ for the most common problems is available as well.

In order to remote debug the JavaScript code of web scripts and the like using Eclipse, a special debug component has to run within the Alfresco server and listen on a TCP port for incoming debug communication (see the Java Platform Debugger Architecture). The JSDT provides the necessary JARs as part of its plugins, so we only have to copy them into <tomcat>/webapps/alfresco/WEB-INF/lib (due to a class dependency on the Rhino engine, <tomcat>/shared/lib is not an option). Those libraries are:

  • org.eclipse.wst.jsdt.debug.rhino.debugger_<version>.jar
  • org.eclipse.wst.jsdt.debug.transport_<version>.jar

Based on the guide for debugging an embedded script engine, the server component has to be bound to a Rhino runtime context and activated. This requires implementing a simple bootstrap bean and including it in the web application startup via Spring.

package com.prodyna.debug.rhino;
 
import java.text.MessageFormat;
 
import org.eclipse.wst.jsdt.debug.rhino.debugger.RhinoDebugger;
import org.mozilla.javascript.ContextFactory;
import org.springframework.beans.factory.InitializingBean;
 
public class RemoteJSDebugInitiator implements InitializingBean {
 
	private static final int DEFAULT_PORT = 9000;
	private static final String DEFAULT_TRANSPORT = "socket";
 
	private boolean suspend = false; // suspend until debugger attaches itself
	private boolean trace = false; // trace-log the debug agent
	private int port = DEFAULT_PORT;
	private String transport = DEFAULT_TRANSPORT;
 
	// the global context factory used by Alfresco
	private ContextFactory contextFactory = ContextFactory.getGlobal();
 
	public void afterPropertiesSet() throws Exception {
		// setup debugger based on configuration
		final String configString = MessageFormat.format(
			"transport={0},suspend={1},address={2},trace={3}",
			new Object[] { this.transport, this.suspend ? "y" : "n",
				String.valueOf(this.port), this.trace ? "y" : "n" });
		final RhinoDebugger debugger = new RhinoDebugger(configString);
		this.contextFactory.addListener(debugger);
		debugger.start();
	}
 
	public void setSuspend(boolean suspend) { this.suspend = suspend; }
	public void setTrace(boolean trace) { this.trace = trace; }
	public void setPort(int port) { this.port = port; }
	public void setTransport(String transport) { this.transport = transport; }
	public void setContextFactory(ContextFactory contextFactory) { this.contextFactory = contextFactory; }
}

The following bean delcaration in <tomcat>/shared/classes/alfresco/extension/dev-context.xml activates the bean.

<bean id="pd.jsRemoveDebugger" class="com.prodyna.debug.rhino.RemoteJSDebugInitiator">
	<property name="port"><value>8000</value></property>
	<property name="trace"><value>true</value></property>
</bean>

After restarting the Alfresco Repository server Eclipse can connect to the Rhino engine using the JavaScript debugger. The parameters used in the activation bean – wether default or customized – need to be provided in the debug configuration, using the Mozilla Rhino Attaching Connector.

Unfortunately that is not yet enough to successfully debug server side JavaScript from within Eclipse. Similar to the classpath for Java source code, JavaScript files need to reside in a specific structure for Eclipse to be able to associate them with scripts being executed by the server engine. Only if this association can be made are breakpoints set in JavaScript source code actually being transmitted to and evaluated by the server side debugger component.

The expected source code structure for remote debuggable scripts is dependent on the source name used when executing scripts with the Rhino engine. Alfresco refers to the file URI of the main script, i.e. in a repository server set up under “D:\Applications\Swift\tomcat” the URI for the web script controller sites.get.js is “file://D:/Applications/Swift/tomcat/webapps/alfresco/WEB-INF/classes/alfresco/templates/webscripts/org/alfresco/repository/sites/sites.get.js”. Such a URI is mapped without the “file://D:/” prefix to a automatically created source project “External JavaScript Source” according to the FAQ of JSDT. That did not work for and after studying the source code of the JSDT plugin I found a working alternative: with the first path fragment referring to a specific source code project, the remainder of the path is used for code lookup relativ to that project. In order to debug a JavaScript web script controller of my Swift repository server, those web scripts had to be made available in a project called “Applications” and a subfolder structure “Swift/t/tomcat/webapps/alfresco/WEB-INF/classes/alfresco/templates/webscripts/”. The simplest way to do this is linking the source code of the Remote API project into such a structure instead of duplicating it.

Having complete the last piece of configuration, breakpoints set in Alfresco web scripts like sites.get.js will now be properly transmitted to and activated on the server. On the next execution of a site search from within Alfresco Share, the debugger will pause at the specified code line. Standard features like step over / iunto, variables and expression views are available to investigate the behavior of the selected script. Especially the expressions view is currently of utmost importance as the debugger is not yet able to handle Java objects as variable values unless they are transformed into native JavaScript instances via an expression.

Mid-term review: The Eclipse JSDT allows debugging of JavaScript scripts that are part of the Alfresco application – i.e. lying in its classpath – from within the familiar IDE used by a majority of Alfresco developers. This eliminates the previous restrictions imposed by the Rhino Debugger which only allowed debugging on servers that were either local or sported a graphical user interface. Setting up the JSDT remote debugger takes some getting used to but should be easy to handle with the tools provided by the IDE, such as source linking. Currently there are some functional limitations and peculiarities due to the yet not matured debugger and the way the Rhino engine is embedded within Alfresco. I will address some of these issues in upcoming posts of this new blog series and provide solutions where possible.

Managing and using custom classifications

Classifications in Alfresco allow for associations of content elements with specific categories from a hierarchial structure. The standard product provides a generic out-of-the-box tree of categories relating to languages, regions and types of software documentation classifications. Lucene queries may be formulated that select or aggregate content based on the associated categories and even sub-trees of categories, e.g. selecting all documents associated with an English language independent of the actual dialect, i.e. American, British or any other English variant.

The Alfresco wiki has a pretty good documentation of classifications and categories. Unfortunately, the documentation does not reflect how classifications are used in a  apparent majority of projects. Instead of defining custom classifications similar to the example provided in the wiki, I have seen numerous instances where the out-of-the-box hierarchy was simply extended. This is understandable considering the amount of functionality provided for the out-of-the-box hierarchy and the effort saved by reusing it. But this approach means that categories do not serve in their intended function of providing semantically separate classifications – any arbitrary category may be assigned to the cm:categories property instead of chossing from business-oriented, separate value sets with the odd connection between individual categories.

My colleagues and I have all participated in several projects that use custom classifications to organize content and – in some instances – provide virtual navigation structures based on content classification. Apart from the technical architecture, Alfresco does not provide much in the way of supporting using custom classifications. The category manager included in the Web Client only handles the out-of-the-box hierachy as does its Share counterpart introduced in Alfresco 4.0 (by Jan Pfitzner).  In order to save us – and other developers of the community – the trouble of having to reinvent the wheel any more than necessary, I recently set out to enhance Jans component and submit it as a contribution to Alfresco.

In short I have modified the following four aspects based on Alfresco 4.0c:

  • Added the ability to manage multiple classifications
  • Added the ability to create / modify categories that use a business-specific subtype
  • Patched the Forms API to allow creating new content objects using a child-association other than cm:contains
  • Patched the object-finder form component to support usage of business-specific category subtypes

Category Manager - Multiple Classifications

The Share category manager only allows for managing the out-of-the-box hierarchy of cm:generalclassifiable – as does its Web Client predecessor. In order to simplify usage of custom classifications, they need to be managable without requiring additional develeopment effort. Minimal adaptions made to the tree construction code and the introduction of a new data web script on the repository tier now allow any classification to be administered. In order to hide certain technical classifications that are used and managed differently (e.g. cm:taggable and cm:classifiable) I have introduced a configuration option to ignore these classification aspects.

Category Manager - Form-based Management

Categories are standard nodes – much like almost anything else in Alfresco – that are defined by the type cm:category. This type may suffice for the most usages, but sometimes there is the requirement of associating business metadata with categories. The data dictionary and modelling of Alfresco allows that subtypes of cm:category may be defined or aspects used to enhance categories with the additional data. The Share category manager only supported simple categories with a name as the sole property. I have based the component on forms to provide the necessary flexiblity in managing category types and metadata. New categories will always be created using a form dialog, while existing categories may be edited using the insitu-editor or a form dialog based on another configuration option I have introduced. The former is the default for simple categories of type cm:category.

The form-based management of categories required extensions of the Forms API. In order to create root categories in the correct location it was necessary to provide a form filter that resolves the virtual node reference alfresco://category/root to the correct reference for the specific classification aspect. The creation of sub-categories via forms in addition required the ability of specifying the correct child-association to use (cm:subcategories) instead of the default cm:contains – a feature marked with a TODO in the code but not implemented since at least Alfresco 3.2.

Assigning values from a custom classification

Using a custom classification for editing the metadata of a content item only worked if the categories used where of the type cm:category. Navigating into a sub-level of the hierarchy was not possible for any subtype otherwise. Supporting subtypes required an adaption to the object finder providing the selection dialog within forms and the supporting data web script on the repository tier. Type specific checks were replaced by proper type hierarchy evaluations. A small Spring Surf Extension may be used to associate any business category type with either the generic or a custom icon.

cm:name – An enforced property

in the last post I described a performance problem which could be traced back to the usage of cm:name (cm:cmobject as parent type) in modelling / instantiating 500.000+ record sets in the default content store. Using one of the listed concepts to work around this issue, I have been setting up a small migration aiming to remove the redundant property cm:name by switching to the parent type sys:base. I have since come to realize that cm:name isindependently from my model type definition in the data dictionaryenforced on all public interfaces and always indexed. Only the integrity checks for mandatory and constrained properties respect the actual type definition.

This of course negates the purpose of my entire approach of combatting our performance problems. If it is impossible to have a node in the database which is not indexed with a cm:name property, side effects on the performance of sorting navigation scripts for Share using that same property are unavoidable..

How does this behaviour manifest itself?

  • When a node is created without a cm:name value, no value for that property is persisted to the database. During reads on the nodes properties, the UUID of the NodeRef is transparently returned as a fake cm:name value (see e.g. DBNodeServiceImpl.getProperty(NodeRef, QName) or ReferenceablePropertiesEntity.addReferenceableProperties(Node, Map<QName,Serializable>)).
  • Only the properties defined in the type definition are validated during node creation / modification. Since cm:name is only defined for cm:cmobject and its subtypes, it is only validated for these types. Any evaluation of the mandatory constraint is suppressed as the property is being faked to have the value of the UUID if not set explicitly.
  • During indexing the type definition of the node being indexed is not respected as far as properties are concerned. All properties present on the node are indexed according to their property definition, irregardless of wether they should even be present on the node or not. This means that even if nodes do not inherit from cm:cmobject, a cm:name value is being indexed because a) the property is transparently set to the UUID if not present and b) a property definition for cm:name exists which specifies that it must be indexed.

This behaviour has essentially remained unchanged since 3.2 based on my investigations into the Alfresco SVN and remains in place in the current 4.0 trunk. I was unable to identify which Alfresco feature might require this enforcement of the property, overriding the configuration of my data model. Regarding the question “bug or feature” I am currently leaning towards “bug”. Since this was discovered in a project of an Enterprise customer, I have relegated this question to Alfresco Support. In case this is a consciously implemented behaviour it would be better / more appropriate to model cm:name as a property of sys:base, similar to how sys:referencable defines the other common set of properties (store protocol, identifier and UUID).

cm:name – Limits of sorting

We have implemented a compliance management system on the basis of Alfresco Share 3.2.2 for one of our customers. In addition to contract and document templates, organisational structures and complex workflows to comply with review, approval and documentation processes, this system also manages more than 500.000 base data record sets. The latter are modelled as an abstract content type with aspects grouping subsets of properties, and are regularly imported from / synchronized with an external data source. The records reside in the default ContentStore “workspace://SpacesStore” as 500.000 objects with about 20 properties are too few to expect a noticable impact on the performance of the platform as a whole.

Despite our expectations, a problem based on the Alfresco-specifics of sorting Lucene-searches was observed. Since the content type uses cm:cmobject as the parent type, every object inherits the property cm:name which we map to the unique key of the associated record. The first import added more than 500.000 entries to the Lucene index with fully distinct values for the field @cm:name, causing a noticable drop in the performance of the Share document library navigation. We observed a base overhead of 3-5 s for every search sorting on @cm:name even before the actual Lucene search startet processing. As eery navigation within the Share document library executes a sorting search in doclist.get.js the entire application is affected beyond the users tolerance levels.

What causes Alfresco to perform this badly considering a (presumably) small data set? From a technical point of view two main reasons can be identified:

  1. All values of the field to be sorted will be loaded from Lucene into memory for pre-sorting (for us this means more than 800.000 distinct values for an usually less than 50 results to sort).
  2. The internal Lucene FieldCache cannot be used to optimise repeated queries. Each search makes use of a unique IndexReader wrapper-instance due to multi-layered faceting – the FieldCache on the other hand is contractually obliged to only return previously loaded field values for the identical instance. This means that field values are always loaded directly from the index. (Those with time and curiosity at hand may inspect the cache using a Java debugger and will notice that the necessary data would be available several times over but can not be accessed.)

The magnitude of the performance impact sclaes with the I/O performance of the data volume used for the index. My personal development laptop which includes a solid state drive usually offers better performance than customers are willing to pay for in their productive machines. Thus I only have to suffer 1 – 2 s degradation, but intensive use of the navigation will swiftly lead to a bad impression on users.

What solutions / concepts are there to addres these performance problems for sorting searches?

  • Large amounts of base data should be stored in separate ContentStores, which automatically use a separate index. This is possible only if there are either no or just simple hierarchial relationships with other data sets to consider.
  • Metadata for sorting should be mapped to individual, business specific properties if at possible. When standard properties are used, sorting performance side effects may be incurred involuntarily when large record sets reuse the same property.
  • Searching over smaller subsets may in extreme cases be faster using sorting (and paging) implemented using JavaScript or Java instead of relying on Lucene. (In our case this would be possible for the navigation within the documen tlibrary since only 5 to 15 elements are managed on any one hiearchy level.)
  • Migration to Alfresco 4.0 which uses SOLR / canned queries.

This was a rather unexpected realisation for me as this means that only a few hundred thousand of documents can be managed in Alfresco Share before the document library as its core component reacts noticiably slower. Previous experiences with managing millions of objects in a single Alfresco instance are in a rather strong contrast to this …

The problems relating to sorting have been known to Alfresco for a time. Combined with similar problems with PATH-based queries and permission checking of large result sets, this was the reason for / a reinforcement of the switch to SOLR and moving core queries to the datbase layer in Alfresco 4.0 Expecially canned queries guarantee that sorting queries are affected only by the properties of the objects in the hierarchy being queried.