Learning Apache Apex
上QQ阅读APP看书,第一时间看更新

Creating a new Maven project

Apex applications are packaged in a special ZIP file format that contains everything needed for an application to be launched on a cluster (dependency jars, configuration files, and so on). It is roughly comparable to the uber jar approach that some other frameworks employ, with the difference that dependencies in the Apex package remain as inpidual JAR files, rather than being flattened into a standard JAR.

More information about Apex application packages can be found at http://apex.apache.org/docs/apex/application_packages/#apache-apex-packages.

It would be a rather involved task to set up a new Maven project from scratch. The Apex application archetype simplifies the process of creating an application skeleton for the expected artifact structure. Here is an example of the Maven command to create an Apex application archetype:

mvn archetype:generate \
-DarchetypeGroupId=org.apache.apex \
-DarchetypeArtifactId=apex-app-archetype -DarchetypeVersion=RELEASE \
-DgroupId=com.example -Dpackage=com.example.myapexapp -DartifactId=myapexapp \
-Dversion=1.0-SNAPSHOT

In this case, we are using RELEASE as the archetype version to refer to the latest available Apex release (http://apex.apache.org/downloads.html). You can replace it with a specific version you want to use instead, which will then also become the Apex engine dependency version in the generated pom.xml.

The newly generated project does not depend on or inherit anything from a parent POM. So, if your organization requires a common parent POM, then you will be able to use that without extensive changes to this project.

If, instead of creating a brand new project, you would like to start from a similar existing project and modify it to cater to your use case, have a look at the examples that come with the Apex library at https://github.com/apache/apex-malhar/tree/master/examples that also cover many operators that are frequently needed in projects.

The generated project has the following typical Maven project structure:

$ tree myapexapp/ 
myapexapp/ 
├── pom.xml 
├── src 
│   ├── assemble 
│   │   └── appPackage.xml 
│   ├── main 
│   │   ├── java 
│   │   │   └── com 
│   │   │       └── example 
│   │   │           └── myapexapp 
│   │   │               ├── Application.java 
│   │   │               └── RandomNumberGenerator.java 
│   │   └── resources 
│   │       └── META-INF 
│   │           └── properties.xml 
│   ├── site 
│   │   └── conf 
│   │       └── my-app-conf1.xml 
│   └── test 
│       ├── java 
│       │   └── com 
│       │       └── example 
│       │           └── myapexapp 
│       │               └── ApplicationTest.java 
│       └── resources 
│           └── log4j.properties 
└── XmlJavadocCommentsExtractor.xsl 

In addition to the usual Java main and test directories, the project contains an assembly specification that defines the structure of the resulting application package. There is also an optional site/conf directory that can be used for additional configuration files that the user can select when launching the application. To try out the project, run the simple placeholder application with the JUnit test:

cd myapexapp 
mvn test 
... 
hello world: 0.25805982800750105
hello world: 0.8945864455634059
... Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.78 sec

The test runs for about 10 seconds and prints generated numbers. Successful execution validates the project and environment setup.

The test is actually an integration test for the entire placeholder application, not just a unit test for a single class or a method within a class. It uses the embedded execution mode that will run the application DAG within the unit test JVM. This is the preferred way for functional testing your application and does not require a cluster.

We will now examine various aspects of the application and the process of making changes to its logic and configuration.