This article describes how to integrate the Hunspell spelling engine in Enterprise Server for use in the Multi-Channel Text Editor of Content Station.
Notes:
|
The integration of Hunspell consists of the following steps:
- Hunspell installation
- Enterprise Server configuration
Hunspell installation
Step 1. Installing Hunspell
Step 1. Check the version of MacPorts by entering the following command in the Terminal:
sudo port version
Step 2. (Optional, only when the installed version is version 1.8 or older) Update MacPorts by entering the following commands:
sudo port -d selfupdate
sudo port upgrade outdated
Step 3. Install the Hunspell spelling engine by entering the following command:
sudo port install hunspell
Step 4. Check the version of the installed engine by entering the following command:
hunspell -vv
If the installation was successful, the indicated version number should be 1.3 or higher.
Note: Version 1.3 is required so that Enterprise Server can make use of the “hunspell -D” command.
Step 1. Create a temporary folder, such as C:\Temp\.
Step 2. Download Hunspell from the following location (applicable to 32bits and 64bits operating systems) and save it to the created temp folder:
http://sourceforge.net/projects/hunspell/files/Hunspell/1.2.8/hunspell-1.2.8-win32.zip/download
Step 3. Extract the downloaded ZIP file.
The extracted file should now have its own folder: C:\Temp\hunspell-1.2.8-win32\ .
Step 4. Rename the extracted folder to Hunspell: C:\Temp\Hunspell.
Step 5. Move this folder to the Program Files folder: C:\Program Files\Hunspell.
Step 6. Using the Command Prompt, navigate to the Hunspell directory by entering the following command:
cd C:\Program Files\Hunspell
Step 7. Check the installed engine version by entering the following command:
hunspell.exe -vv
The displayed version number should be 1.2.8 (when downloaded from the above mentioned link.
The following steps will register ‘hunspell.exe’ under the environment variables so that it is recognized each time you type in the Command Prompt without going to the Hunspell application folder “C:\Program Files\Hunspell”.
Step 8. Go to Start > Control Panel > System > Advanced System Settings.
Step 8a. In the System Properties dialog box, click Environment Variables.
Step 8b. Do one of the following:
For a new entry:
- For "Variable Name", enter Path.
- For "Variable Value", enter the Hunspell application path.
Note: Enter the path with a semi-colon (;).
Example: C:\Program Files\Hunspell;
For an existing entry:
- For "Variable Value", enter the Hunspell application path to the end of the existing value.
Note: Values are separated by a semi-colon (;).
Example: <other settings>; C:\Program Files\Hunspell;
Step 8c. Exit the Command Prompt.
Step 8d. Access the Command prompt again.
Step 8e. Test running Hunspell by entering
hunspell.exe -vv
Step 1. Check whether or not Hunspell is already installed by entering the following command:
hunspell -vv
One of the following will appear:
- When Hunspell is not installed:
-bash: -bash: hunspell: command not found
Continue with Step 2.
- When Hunspell is installed:
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.2.8)
The installed version should be version 1.2.8 or higher. If this is not the case, continue with Step 3. Else, the process is complete.
Step 2. (Optional, only when Hunspell is not installed) Install Hunspell by doing one the following:
Step 2a. Enter the following command:
yum -y install hunspell
Step 2b. Repeat step 1 and verify the outcome.
Step 3. (Optional, only when the installed version is not version 1.2.8 or higher) Update Hunspell by entering the following command:
yum -y update hunspell
Step 3a. Repeat step 1 and verify the outcome.
Step 2. Installing a dictionary
Step 1. Download the dictionary by doing one of the following:
For non-Russian dictionaries:
Step 1a. Navigate to www.macports.org.
Step 1b. From the Getting Started menu on the left, click Available Ports.
Step 1c. Search for hunspell to find the required dictionaries.
Step 1d. Note down the name of the dictionary file.
Step 1e. Enter the following command in the Terminal:
sudo port install hunspell [file name]
For Russian dictionaries:
Step 1a. Navigate to http://wiki.services.openoffice.org/wiki/Dictionaries.
Step 1b. Search for Русский (Россия) and follow the link.
Step 1c. Download the dictionary.
Step 1d. Unpackage the downloaded file and — depending on the type of package — install the package by either running the installer or by entering the following command in the Terminal:
sudo port install [file name]
Step 2. Check the installed dictionaries by running the following command in the Terminal:
hunspell -D
The following (or something similar) should appear:
/opt/local/share/hunspell/en_US
Step 3. Press Ctrl+Z to quit the spell checking mode.
Note: When an outdated version of Hunspell is used, the following message appears: "Can’t open affix or dictionary files".
Step 1. Download the preferred dictionary from http://wiki.services.openoffice.org/wiki/Dictionaries.
Note: In the following steps, it is assumed that the English (United States) dictionary has been downloaded.
Step 2. Unzip the downloaded dictionary file.
The download should contain the following two main files:
- en_US.aff
- en_US.dic
Step 3. In the directory in which Hunspell is installed, create a directory for storing all dictionary files.
Note: In the following steps, this path is referred to as <hunspell_dicts_dir>.
Step 4. Place the .aff and .dic files into this directory.
Step 5. (Optional, only when installing a dictionary for the first time; when installing subsequent dictionaries, this step is not required). Go to Start > Control Panel > System > Advanced System Settings.
Step 6. In the System Properties dialog box, click Environment Variables.
Step 7. Under System Variables, click New.
Step 5. For "Variable Name", enter DICTIONARY.
Step 6. Under "Variable Value", enter en_US.
Note: In case multiple dictionaries are installed, enter the value for any dictionary; it does not necessarily have to be a specific one.
Step 7. Click OK.
Step 8. Under "System Variables", click New.
Step 9. For "Variable Name", enter DICPATH.
Step 10. For "Variable Value", enter the <hunspell_dicts_dir>.
Example: “C:\Program Files\Hunspell\hunspell”.
Step 11. Click OK.
Step 12. Restart the system. (Restarting IIS by itself is not sufficient.)
Step 13. Access the Health Check page In Enterprise Server.
Step 13a. In the Maintenance menu or on the Home page, click Advanced. A page showing links to advanced Enterprise options appears.
Step 13b. Click Health Check. The Health Check page appears.
Step 14. Run the Hunspell Spelling test in order to verify that all dictionaries are UTF-8 encoded.
Step 15. Open the Command Prompt.
Step 16. Check the installed dictionaries by entering the following command:
hunspell -D
The following (or something similar) should appear:
C:\Program Files\Hunspell\hunspell\en_US.aff
C:\Program Files\Hunspell\hunspell\en_US.dic
Hunspell 1.2.8
...
Step 17. Press Ctrl+C to quit the spell checking mode.
Tip: In case additional dictionaries need to be added, just add the relevant .aff and .dic files; no environment variables need to be set.
Step 1. Enter the following command to bring up a list of all available Hunspell dictionaries:
yum search hunspell
Step 2. Install the required dictionary by entering the following command (English is used as an example):
yum install hunspell-en
Step 3. Enter the following command to verify the installed dictionaries:
hunspell -D
A list similar to this one should appear:
/usr/share/myspell/en_DK
/usr/share/myspell/en_JM
/usr/share/myspell/en_NA
...
LOADED DICTIONARY:
/usr/share/myspell/en_US.aff
/usr/share/myspell/en_US.dic
Hunspell 1.2.8
...
Step 4. Press Ctrl+Z to quit the spell checking mode.
Step 3. Checking the spell checking functionality
Note: In the steps below it is assumed that the American English dictionary is used.
Step 1. Enter the following command in the Terminal:
hunspell -i utf-8 -d en_US
Step 2. Enter some regular text and see if the spelling is checked correctly.
Step 1. (Optional, only required If you did not register your spell engine directory under environment variable ‘PATH’.) In the Command Prompt, navigate to your respective spell engine directory by entering the following command:
<hunspell_dir>, <aspell_dir>, <enchant_dir>
Step 2. Enter the following command:
hunspell.exe -i utf-8 -d en_US
Step 3. Enter some regular text and see if the spelling is checked correctly.
User rights The user ‘IUSR’ must have the proper rights on the cmd.exe command. The reason is that Enterprise calls the spelling engine executables via a separate shell. You can check the rights as follows: Step 1. Open Windows Explorer and locate the command cmd.exe (for example C:\Windows\system32\cmd.exe). Step 2. Right-click the file and choose Properties. Step 3. Access the Security tab. Step 4. Check if the Internet user IUSR has Read & Execute rights. If not, add the user IUSR and assign the mentioned access right. On Windows the user ‘IUSR’ and the user ‘SERVICE must have the proper access rights on the system temporary directory. The reason is that the shell command mentioned above is run by user ‘SERVICE’. After the output from the spelling engine is written to a temporary file, this output is read by Enterprise. Enterprise is run by internet user ‘IUSR’. Step 1. Open Windows Explorer and locate the system temporary directory (for example C:\Windows\Temp). Step 2. Right-click the folder and choose Properties. |
Step 1. Enter the following command in the Terminal:
hunspell -i utf-8 -d en_US
Step 2. Enter some regular text and see if the spelling is checked correctly.
Enterprise Server configuration
The Enterprise Server configuration consists of the following steps:
The available dictionaries need to be added to the configserver.php file by performing the following steps:
Step 1. In the configserver.php file, locate the ENTERPRISE_SPELLING option.
The structure is as follows:
define(‘ENTERPRISE_SPELLING’, serialize( array(
0 => array(
<list of dictionaries>
),
)));
Tip: The list of dictionaries can be as long as needed and may contain dictionaries from different spelling engines.
The structure for a dictionary looks as follows:
'American English' => array( // Hunspell shell
'language' => 'enUS',
'wordchars' => '/(['.WORDCHARS_LATIN.']+)/u',
'serverplugin' => 'HunspellShellSpelling',
'location' => '/opt/local/bin/hunspell',
'dictionaries' => array( 'en_US' ),
'suggestions' => 10,
language
The language code in llCC format (l = language code, C = country code, for example: enUS for English).
- Language code: the ISO 639 standard is used (see http://en.wikipedia.org/wiki/ISO_639). Codes can be looked up here: http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes.
- country code: the ISO 3166 standard is used (see http://en.wikipedia.org/wiki/ISO_3166). Codes can be looked up here: http://en.wikipedia.org/wiki/ISO_3166-1.
wordchars
A definition of the type of characters used by the spell checker.
Use this feature to specify which characters the spell checker should check and which to ignore. This method makes it possible to have numbers, symbols, or characters from foreign languages ignored, even when these characters are for instance added to a valid word.
What characters to include is specified through ranges of Unicode (UTF-16) index numbers. A range is specified in such a way that it can be used directly in regular expressions. For example ‘A-Z’ means all alphabetic characters in uppercase.
Note: For additional information and a list of latin, Russian, and Japanese ranges, see the WORDCHARS_ section in the configsserver.php file.
serverplugin
The internal name of the server plug-in that integrates the spelling engine.
This name can be found in the Enterprise/server/plugins folder for standard shipped plug-ins (such as Hunspell) or in the Enterprise/config/plugins folder for custom plug-ins (such as the ones downloaded from Labs). The spelling plug-in’s folder name needs to be taken, respecting camel case:
- Hunspell: HunspellShellSpelling
- Google: GoogleWebSpelling
- Aspell: AspellShellSpelling
- Enchant: EnchantPhpSpelling
location
Full file path to the engine’s executable file, or web URL to the engine’s Web service entry point.
Always use forward slashes (/) to separate folders, even for Windows.
Check the installation path by doing the following:
- Mac OS / Linux: Enter one of the following commands:
which hunspell
which aspell
which enchant
- Windows: There is no direct way to find the location of installed spelling engines. Generally, spelling engines are installed in the Program Files folder.
Example: C:/Program Files/Hunspell/hunspell.exe
dictionaries
The names of dictionaries installed for the spelling engine.
It should always be an array, also when there is just one dictionary to be listed.
Example: 'dictionaries' => array( 'en_US' ),
Multiple dictionaries can be combined, as long as they are all of the same language.
Example: 'dictionaries' => array( 'en', 'en-medical', 'en-legal' ),
suggestions
The maximum number of suggestions for a misspelled word to display to end-users.
Enter a value between 1 and 10.
doclanguage
(Optional) Document’s language code. Used in InDesign/InCopy to pre-select the dictionary for a certain article text fragment for spell checking.
When the doclanguage is not specified (which is default behavior), Enterprise Server derives this value from the language option (see above).
Note: When editing an article in Content Station and subsequently editing it in InDesign/InCopy, it is important that the same dictionaries are used in InDesign/InCopy as well. If the incorrect dictionary is used in InDesign/InCopy, the doclanguage option can be used to point it to the correct dictionary.
Example: A Dutch dictionary ‘nlNL’ has been configured in the language option. Let’s assume that InDesign/InCopy subsequently uses the ‘Dutch: Old Rules’ dictionary, whereas we want it to use the ‘Dutch: 2005 Reform’ dictionary instead. To resolve this, follow these following steps: Step 1. Look up the getLanguageCodesTable() function in the configlang.php file and verify the entry for nlNL:
This means that the ‘Dutch’ token stands for the ‘Dutch: Old Rules’ dictionary. Step 2. Open a new InCopy CS5 article. Step 3. In the Paragraph Styles panel, edit the Basic Paragraph Style as follows (optionally create a new paragraph style and edit that one): Step 3a. Select Advanced Character Formats. Step 3b. From the Language list, choose the required dictionary, in our example Dutch 2005 Reform. Step 3c. Close the dialog box. Step 4. With the paragraph style applied, enter some text. Step 5. Save the article to the Desktop. Step 6. Open the article in a plain-text editor or Web browser. Step 7. Search for the language tag named AppliedLanguage. It should show the value used internally, in our example $ID/nl_NL_2005. Step 8. Use the value following ‘$ID/’ to add the doclanguage option to the dictionary configuration structure in the configserver.php file:
|
Step 2. Enter as many dictionaries as required.
Step 3. Save the file.
The Hunspell Server plug-in is shipped with Enterprise. Verify that it has been installed correctly and that it is running properly by checking its status on the Server Plug-in page in Enterprise Server.
In Enterprise Server, click Server Plug-ins in the Maintenance menu or on the Home page.
When the whole spelling feature needs to be disabled, disable all installed spell checking plug-ins on the Server Plug-ins page. When this is needed during production (for instance for maintenance reasons) make sure all that all Content Station users are logged out, or request them to re-login.
To verify if the spell checking functionality has been integrated successfully, follow these steps:
Step 1. Run the following page in a Web browser:
http://<Server URL>/server/wwtest/spelling/workbench.php
Step 2. From the Dictionary list, choose the dictionary to test.
Step 3. Enter some text in the main window.
Step 4. Click Check Spelling.
The following columns appear:
- Checked Words. Shows how text is split up into words. This is based on the regular expression specified in the wordchars option in configserver.php. When words are split up incorrectly, verify the wordchars option.
- Misspelled Words. Shows the spell checking results of the spelling engine. When incorrect suggestions are made, check the installed dictionaries for your spelling engine.
Using multiple servers
When Enterprise Server is replicated over multiple Server machines, repeat the installation steps as listed in Integrating the Hunspell spelling engine in Enterprise Server 9 for each machine.
Recommended is to have an exact copy of the configserver.php file throughout all Server machines. If that is the case, make sure that the spelling engine installation paths are exactly the same, including the installation paths of the dictionaries.
The Enterprise Health Check does not check installation or configuration differences between the Server machines. The system administrator is responsible to keep the machines in sync; this includes the installed engine versions and dictionaries. The same applies to words that might get added to the dictionaries (which is not supported by Enterprise, but can be manually done through the shell or command prompt).
When you would use the global entry point URL to index.php (as used by client applications and configured in the WWSettings.xml file), the Health Check will be assigned to a certain available machine, resulting in the configuration paths on that machine being checked. Even when running this many times does not guarantee that all machines are hit. Instead, the Health Check should be run on a local URL for each machine.
Comment
Do you have corrections or additional information about this article? Leave a comment! Do you have a question about what is described in this article? Please contact Support.
0 comments
Please sign in to leave a comment.