Custom metadata in Elvis Server 4
Elvis already extracts and offers a lot of the standard metadata fields, but to offer extra customization you can add custom metadata fields to suit your needs. It is also possible to modify settings of certain metadata groups. To modify metadata settings some basic knowledge of XML is required.
About updating metadata fields
Updating metadata fields or adding custom fields is done in the custom-assetinfo.xml file (found in <Elvis Server>/Config).
Note: After changing the custom-assetinfo.xml file, restart the Elvis service to have the changes take effect.
Tip: To make creating custom metadata fields a little easier, there are several templates available in the knowledge base. These are the templates we have used ourselves for the default fields.
If there are any problems in the assetinfo configuration, Elvis will fall back on just the standard metadata. Certain errors in the custom-assetinfo.xml will trigger Elvis to make a new index of all the items stored in Elvis.
Before you start
Before you start modifying the metadata settings, make sure that you have a plan. Once custom fields are active, they are difficult to modify without causing any data loss. Renaming or removing a field completely removes all data already stored in the field, and other changes, like changing the data type, can corrupt your index.
Take the following points into consideration:
- Fields must have a unique name in the system
- Make sure you know what type of data will be collected in the field, this will be impossible to change
- Can a field have multiple values or just one?
- Will the fields be searchable for the search engine?
- Should users be able to sort on the data in the field?
- Should the data in the field be able to be displayed in the search results?
- What group should the field belong to in the interface?
- For what type of content should the metadata be used?
Metadata in Elvis is organized in one XML structure that describes different kind of settings in one file. The code block underneath shows the very basic structure.
<assetsInfoExt> <fieldGroups> <fieldGroup/> </fieldGroups> <assets> <assetTypeBaseExt> <fields> <field> <storage/> <compass/> <data> <taxonomy> </data> <userInterface/> <description/> </field> </fields> </assetTypeBaseExt> </assets> </assetsInfoExt>
Everything to do with metadata on content is defined in this structure. You can add field groups, which organizes the way metadata is grouped in your view. You can add metadata fields to your assets, which have different settings for the search index and the way the data is stored.
The rest of this guide shows samples for different field settings in certain scenario's. You can see an example of custom fields in the XML file as well as start working on your own customizations.
Adding field groups
You can add your own custom field groups by adding a field group to the custom-assetinfo.xml. These field groups are the categories that show up in your metadata panel, it's a way to group fields for the user's client.
<fieldGroups> <fieldGroup name="CompanyData" /> <fieldGroup name="WebsiteInformation" /> </fieldGroups>
The example shows how to create two field groups, CompanyData and WebsiteInformation. You can add as many groups as you like, you can't remove any existing groups. Make sure that the name of the group does not contain any spaces or special characters.
Adding metadata fields
Metadata fields have several settings:
- for the search index
- the storage engine
- the type of data stored
To build up your own fields, you need to define all of the settings, to ensure proper processing and storage. Fields are part of asset information, a group of fields related to each other for a type of asset.
The first part of creating your own custom fields is the creation of your own assetTypeBase that will alter the chain of assetTypeBases, or extend an existing assetTypeBase.
<assetTypeBaseExt name="CommonFields"> <fields> <field> <storage/> <compass/> <data/> <userInterface/> <description/> </field> </fields> </assetTypeBase>
Your assetTypeBase should extend from the current assetTypeBases. Usually if you want a field to apply to all kinds of items, you should extend GeneralFields. If your field is very specific to images, you might choose to extend ImageFields. Underneath is an image showing all the current assetTypeBases and the file types that are mapped to them.
The basic structure for a field is:
<field name="cf_CompanyName" group="CompanyData"> <storage/> <compass/> <data/> <userInterface/> <description/> </field>
A field should have a name and should be ordered in a group. The name of a custom field:
- has to start with cf_
- can not contain any spaces or special characters
- has to be unique in the system.
To provide labels for the custom fields, you can put them in properties files under Config/messages. See the examples in the file to define a name in the UI. The group should be one of the existing groups or a custom group you've created yourself.
Tip: Remember that field groups are used to sort and display the metadata in the user's client, you can define your own field group and sort your fields underneath that group.
Possible existing values of field groups:
- File - typical file information, such as file size
- General - general file information, such as description or tags
- Publication - general publication information, such as edition or channel.
- Rights - information about the usage rights of the content
- Creator - information about the creator of the content
- Licensor - information about the licensor of the content
- Licensee - information about the licensee of the content
- Subject - Information describing the subject of the content
- Location - Information about the location of the content, either subject location and creation location
- OpenCalais - information extracted by OpenCalais
- EXIF - EXIF information
- XMP - XMP information
- GPS - GPS information
- ImageInfo - image specific information, such as width and height
- VideoInfo - video specific information, such as duration and scene
- AudioInfo - audio specific information, such as track number and artist
- DocumentInfo - document specific information, such as number of pages
- PdfInfo - PDF information, such as PDF version or software
- Enterprise - Enterprise metadata
- System - internal Elvis information, like item ID
Underneath are examples for each of the settings to form a basic understanding. Below that will be more general examples of typical settings for certain types of fields.
The storage setting is easiest to determine, this should always be true for custom fields. The entered value of the field should always be stored in the metadata. Only specific system fields can be stored outside of the metadata.
<compass index="un_tokenized" store="yes" excludeFromAll="false" />
The settings of the search index can be found in the <compass> tag. The index setting is the way the value of the metadata field should be stored in the search index.
Possible values for index:
- un_tokenized - the whole value of the field, without spaces or dividing it up in separate words, typically used for fields with exact or non-string values
- tokenized - the value of the field will be chopped up and saved to the index with all the separate words, typical for description, and other long text fields
- no - the value will not be stored in the search index and users will not be able to search on this field
Note: To be able to sort on a field, it has to be either un_tokenized or tokenized with a pureLowerCase analyzer and it cannot be multivalue.
The store setting is used to determine if the field should be able to be displayed in the search results. Enter yes if your value should be able to be displayed and no if it doesn't have to be displayed.
The excludeFromAll settings is used to determine whether a user can search on the value or not. If it's set to true users can not search for the value of the field as a general search term. The values can still be specifically searched or filtered on by using field specific searches, like field:....
The analyzer setting is used to specify a specific analyzer that should be used on the value of a field. Only tokenized fields use analyzers. The following analyzers are available:
- default - the standard analyzer used by Lucene to produce search tokens
- pureLowerCase - produces one token of the entire value in lower case
- alphaNumeric - only splits words at whitespace to produce tokens
analyzerForAll is a new parameter added in Elvis 2.6. When a field is set to be excludeFromAll="false" you can specify an analyzer for the value of the field as it is added to the "all" field. The "all" field is the field that is searched when entering a query into the search box.
For more information about analyzers, please read the Lucene documentation.
In general you should use the following settings for the following situations:
- Search for exact value, case sensitive:
<compass index="un_tokenized" />
- Search for exact value, case insensitive
<compass index="tokenized" analyzer="pureLowerCase" />
- Search for individual or steams of words, case insensitive
This setting determines if and how the data of the field is going to be stored. editable determines if the field value is editable for users. datatype determines the type of data the field is going to contain.
<data editable="true" multivalue="false" datatype="text"> <taxonomy source="keyword-list.txt" onlyFromList="false" sort="false/> </data>
Possible values for the data type:
The multivalue setting is used for a field that can have several values. This does not mean a long text or several words, but a set of texts, etc. By default (if not specified) this is set to false. You can not sort results on the value of the field if it is set to true.
Supported datatype values for a multivalue field:
Unsupported datatype values for a multivalue field:
For taxonomy configuration, see Taxonomy.
User interface settings
The user interface settings are used to configure aspects of user interaction, to configure flags and to determine which filter settings to use.
<userInterface filterUI="forDataType|checkBoxes|tagCloud" filterValuesSource="usedTerms" flagPosition="unique order number" flagIconWhenNotEmpty="icon name" />
The following filter settings can be configured for a field:
- filterUI - Which filter interface to use, with the following possible values:
- forDataType - Show standard filter UI for field data type. This is the default if left empty.
- checkBoxes - Show filter UI with checkboxes. This works as an OR filter on the current results.
- tagCloud - Use in combination with filterValuesSource="usedTerms" for a tag cloud filter.
The filterValuesSource can be set to usedTerms. Since the implementation of a new faceting engine in Elvis version 2.5, allTerms and predefinedValues have been deprecated.
Every field can have a field description where you can add an explanation on the purpose of the custom field. The description uses markdown for formatting.
<description> Width in millimeter. For PDF this is taken from the TrimBox if available or otherwise from the MediaBox. **Example value:** * 210 </description>
In the example below you see the simplest way to add a custom field to Elvis that can be searched and can be used to sort. The field is editable and contains text data.
<assets> <assetTypeBaseExt name="CommonFields"> <fields> <field name="cf_CompanyTarget" group="General"> <storage storeInMetadata="true" /> <compass index="un_tokenized" store="yes" excludeFromAll="false" /> <data editable="true" datatype="text" /> </field> </fields> </assetTypeBaseExt> </assets>
In the example below shows a similar field, but with predefined values, with the possibility to add your own data too, onlyFromList = false.
<assets> <assetTypeBaseExt name="CommonFields"> <fields> <field name="cf_CompanyStatus" group="General"> <storage storeInMetadata="true" /> <compass index="un_tokenized" store="yes" excludeFromAll="false" /> <data editable="true" datatype="text"> <predefinedValues onlyFromList="false"> <value>Idle</value> <value>New</value> </predefinedValues> </data> </field> </fields> </assetTypeBaseExt> </assets>