Info: Performing the steps described in this article requires direct server access. Depending upon how your system is hosted and the level of access you have to that system, coordination may be required with your Partner or WoodWing Support team. For a full overview of the steps that need to be done by WoodWing and how to request them, see WoodWing Cloud - Change management.
The search mechanism of Solr is a powerful and complex tool and it can take some time to get it configured in such a way that users are able to easily find the files that they are after.
In this article we will highlight some of the main aspects to take note of and provide examples and links for further reading.
Tokens
As explained in Understanding the Solr Search functionality in Studio Server, Solr handles data by using 'tokens'.
Example: The following sentence: "Please, email john.doe@foo.com by 03-09, re: m37-xq." is split into the following tokens: "Please", "email", "john.doe@foo.com", "by", "03-09", "re", "m37-xq" |
The default token length varies between 4 and 15 characters.
Also, Solr by default removes characters such as underscores or dashes from the words that are indexed.
Example: "wi-fi" is indexed as "wi" and "fi".
From this, we can conclude that:
- The more tokens exist, the more search results can be returned (potentially too many to be practical).
- Terms shorter than 4 characters will be ignored. For terms that are longer than 15 characters, only the first 15 characters are included.
- When a user enters a search phrase that contains underscores or dashes, no results are displayed.
To resolve these issues, we can:
- Stop Solr from tokenizing on subwords
- Customize the character range of the search token
- Configure Solr search to find objects with underscores or dashes
Each is explained in the following sections.
Stopping Solr from tokenizing on subwords
Do this by disabling (commenting-out) the WordDelimiterFilterFactory class in the schema.xml by wrapping the filter elements between <!- - and - -> brackets as follows:
Step 1. Open the file <Solr installation directory>/schema.xml.
Step 2. Disable (comment-out) the WordDelimiterFilterFactory class.
<fieldType name="textNGram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!--
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" splitOnNumerics="0"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="4" maxGramSize="15"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<!--
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Step 3. Save and close the file.
Step 4. Re-index Solr from the Search Server page in Studio Server.
Customizing the character range of the search token
By default, Solr's N-Gram Tokenizer is enabled and generates n-gram tokens of sizes in the default range of 4 – 15 characters. This is configured in the schema.xml file.
If you wish to adjust the default range or to disable the tokenizer, please make sure that these changes are also reflected in the config_solr.php file by following the steps below.
Step 1. Open the file <Solr installation directory>/schema.xml.
Step 2. Locate any reference to 'solr.NGramFilterFactory':
<filter class="solr.NGramFilterFactory" minGramSize="4" maxGramSize="15"/>
Step 3. To adjust the default range simply change the minGramSize and/or maxGramSize attributes of the filter, for example:
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="18"/>
Or, to disable the n-gram tokenizer, comment-out the filter option as follows:
<!-- <filter class="solr.NGramFilterFactory" minGramSize="4" maxGramSize="15"/> -->
Step 4. Save and close the file.
Step 5. Open the config_solr.php file (recommended: config_overrule.php file).
Step 6. Locate the SOLR_NGRAM_SIZE option:
//Defines the range used for NGRAM size
define ('SOLR_NGRAM_SIZE', serialize(array(
4, // MinGramSize
15, // MaxGramSize
)));
Step 7. To adjust the default range, simply change the values configured for the MinGramSize and/or MaxGramSize options, for example:
//Defines the range used for NGRAM size define ('SOLR_NGRAM_SIZE', serialize(array( 3, // MinGramSize 18 // MaxGramSize )));
Or, to disable the n-gram tokenizer, comment-out the option as follows:
// Defines the range used for NGRAM size //define ('SOLR_NGRAM_SIZE', serialize(array( // 4, // MinGramSize // 15 // MaxGramSize //)));
Step 8. Save and close the file.
Step 9. Re-start Solr.
Step 10. Re-index Solr from the Search Server page in Studio Server.
Configuring Solr search to find objects with underscores or dashes
Step 1. Open the file <Solr installation directory>/schema.xml.
Step 2. Look up the fieldType block starting with <fieldType name="textNGram".
This block contains 2 definitions for 'solr.WordDelimiterFilterFactory' (in analyzer type="index" and "query").
Step 3. Have the original token indexed without modifications by setting preserveOriginal="1".
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"/>
Step 4. Save and close the file.
The applied schema.xml changes now need to be applied to the database objects.
Step 5. Access the Search Server Maintenance page in Studio Server by doing the following:
Step 5a. Click Integrations in the Maintenance menu or on the Home page.
Step 5b. Click Search Server.
Step 6. In the Indexing section, click Clear, followed by Start.
Adding additional properties to the index
To improve search results and sorting, the following properties can be added to the Solr index:
- Flag*
- FlagMsg*
- LockedBy
- LockForOffline*
- PageRange
- PlacedOn
- PlacedOnPage
- PlannedPageRange
* Adding these properties requires Studio Server 10.47.0 or higher.
Adding these properties provides more consistent results when searching for these properties or sorting them when Solr is used. It ensures that the behavior of Studio Server with the Solr Search plug-in enabled is more in line with a Studio Server setup where the plug-in is not enabled.
Example: Prior to Studio Server version 10.47.0, when search results were sorted on one of the three mentioned properties, the results could differ compared to the results before sorting. This was caused by Studio Server performing a database-only search (similar to when the Solr Search plug-in is disabled) because the property was not supported by Solr. |
Step 1. Open the …/config/config_solr.php file (or the config_overrule.php file) and comment out the following section:
// 'Flag',
// 'FlagMsg',
// 'LockedBy'
// 'LockForOffline',
// 'PageRange',
// 'PlacedOn',
// 'PlacedOnPage',
// 'PlannedPageRange',
Result:
'Flag',
'FlagMsg',
'LockedBy'
'LockForOffline',
'PageRange',
'PlacedOn',
'PlacedOnPage',
'PlannedPageRange',
Step 2. Save the file.
Step 3. Open the schema.xml file in your $SOLR_HOME/config folder and comment out the following section:<!--field name="Flag" type="pint" indexed="true" stored="true"/-->
<!--field name="Flag" type="pint" indexed="true" stored="true"/-->
<!--field name="FlagMsg" type="onlySort" indexed="true" stored="true"/-->
<!--field name="LockedBy" type="onlySort" indexed="true" stored="true"/-->
<!--field name="LockForOffline" type="boolean" indexed="true" stored="true"/-->
<!--field name="PageRange" type="onlySort" indexed="true" stored="true"/-->
<!--field name="PlacedOn" type="onlySort" indexed="true" stored="true"/-->
<!--field name="PlacedOnPage" type="onlySort" indexed="true" stored="true"/-->
<!--field name="PlannedPageRange" type="onlySort" indexed="true" stored="true"/-->
Result:
<field name="Flag" type="pint" indexed="true" stored="true"/>
<field name="FlagMsg" type="onlySort" indexed="true" stored="true"/>
<field name="LockedBy" type="onlySort" indexed="true" stored="true"/>
<field name="LockForOffline" type="boolean" indexed="true" stored="true"/>
<field name="PageRange" type="onlySort" indexed="true" stored="true"/>
<field name="PlacedOn" type="onlySort" indexed="true" stored="true"/>
<field name="PlacedOnPage" type="onlySort" indexed="true" stored="true"/>
<field name="PlannedPageRange" type="onlySort" indexed="true" stored="true"/>
Step 4. Save the file.
Step 5. Re-index Solr by doing one of the following:
Note: Re-indexing Solr can take a significant amount of time depending on the size of the database. Consider performing this task outside production hours.
- Restart your Solr application.
- Reload the configuration in Studio Server:
- Access the Search Server Maintenance page (Home » Integrations » Search Server).
- Under Indexing, click Clear followed by Start.
Additional configuration settings
The examples provided above are just a few of the many possible solutions for getting better search results. Which of these solutions you need for your scenario depends on many factors (such as the type of characters used in file names and the length of file names, both typically controlled by file naming conventions).
Correctly configuring Solr for your environment requires a good understanding of the concepts used by Solr and an awareness of the available settings that can be configured.
We therefore advise to go through the Solr documentation, such as Analyzers, Tokenizers, and Token Filters.
Revisions
- 12 December 2024: Added section 'Adding additional properties to the index'.
Comment
Do you have corrections or additional information about this article? Leave a comment! Do you have a question about what is described in this article? Please contact Support.
0 comments
Please sign in to leave a comment.