A bulk import of files to Assets Server is typically done when the system has just been set up and files need to be moved from your existing file store locations to Assets Server.
This will also be a good moment to analyze the files that have accumulated in your file store and to exclude unwanted files from the import to make sure that the Assets Server system that is imported to is as clean as possible.
Assets Server provides various methods for importing files in bulk:
- The Bulk Import tool. This import tool imports large numbers of files from an existing file storage, including from Amazon S3. It moves or copies files and retains the folder structure of the original file store. For more information, see Importing files in bulk using the Bulk Import tool in Assets Server.
- A scripted import through an API. This makes it possible to perform additional tasks on the imported files such as:
- Adding or enriching metadata.
- Including metadata from sidecar files (such as .xmp files that accompany image files).
- Re-defining the folder structure.
- Adding files to Collections.
For more information, see Assets Server REST API - create.
How long does a bulk import take?
When it would take only 1 second for importing 1 file, importing 50,000 assets would take 15 hours while a typical file store ranging anywhere from 500,000 to 10 million assets would take weeks or even months to import.
Improving the import time
To overcome this time problem, consider doing the following:
- Split your bulk import into stages:
- First only import the business-critical files: these are needed straight away.
- Then import all archived files; it is not very likely that these files are needed immediately.
- Increase system performance by:
- Adding Processing nodes to spread the load of extracting metadata, generating and transcoding previews and thumbnails, and embedding the Assets Server ID.
- Adding Search nodes to speed up the process of indexing.
- Using systems with fast hard disks such as SSD disks.
- Making as much network bandwidth available as possible.
Note: Remember that these steps are only required for the initial bulk import. Once Assets Server is up and running, the system can be scaled down again.
Best practices
Performing a bulk import is a major step in the process of getting a system up and running and it is important that this is completed as smoothly as possible.
The following sections describe some best practices to follow for a successful import.
System impact
While a bulk import is often the quickest and easiest way to fill the system without much intervention, performing such an import can have a big impact on the system in terms of performance and storage use.
Start off by first uploading small groups of files (see also the next section 'Import size: how much is too much?'). Monitor the system's CPU, memory, disk space, and network usage for any irregularities and when necessary, increase memory, disk space and so on to successfully handle large groups of files.
Note: When the system is struggling to keep up with the demand, ERRORS and WARNINGS appear in log files. While this is going on, the system and its console cannot be accessed for troubleshooting because it is too busy.
Import size: how much is too much?
In section How long does a bulk import take? earlier in this article, it was mentioned to split up the bulk import into stages because of the time it takes to process large number of files.
How many files then can be seen as too many, taking into account file type and size?
This highly depends on many variables such as the available number of search nodes and processing nodes, memory, and disk space. These are too diverse to describe in detail, but see Hardware requirements and recommendations for Asset Server for some general guidelines.
Best practice:
- Where possible, (temporarily) break up large folder structures into smaller structures and import them separately so that import batches are small enough to process in a more predictable time frame.
Example: Instead of importing a root folder with sub folders containing a million assets (of all sizes and types), instead upload 10 sub folder structures (within that root folder) that only contain for example 100,000 assets each to a folder location in Assets Server. While this does of course require 10 individual bulk import processes, it does give the following advantages:
|
Files that can corrupt or slow down the system
When a file is imported, it is not only added to the system but also processed in some way:
- Metadata is added.
- Previews are generated for images.
- Text is extracted from files that contain text.
- PDFs are created for text-based documents.
- The index is updated.
Of course, sufficient disk space should also be available: a lower number of video files can easily take up much more disk space than a larger number of images.
Best practice:
- To prevent the system from slowing down or crashing when files of a certain type are imported, verify that the system is capable of handling the import load:
- The number of assets that can be indexed and / or processed at the same time is determined by:
- The number of nodes and the available CPUs with multiple threads on these nodes.
- Sufficient memory should be available for in-memory index updates and / or use by processing tools.
- Sufficient disk space should be available for:
- The system, server app and processing tools, tmp use.
- The index (Elvis Hot Data).
- The originals, renditions, temp files, and versions (Elvis Shared Data).
Performance of the source system
High processing speeds on the system that is imported to is only one side of the import process: the system from which the assets are imported also needs to be fast and reliable.
Best practice:
- Improve the connection to the source volume by making sure that it is not also heavily in use by other systems.
- Import from a source volume that is stable and fast, preferably an internal SSD volume.
- When the source system is located on an internal network, make sure that:
- Enough network speed is available.
- The system is not throttled by switches or routers.
- The system is not cache-managed.
Determining the size of the import
When everything is in place to start an import (an optimized system and small structures to import), it is good to get a general idea of what is imported, how long the process will roughly take, and how much disk space is required for storing the originals, all before the actual process is started.
Best practice:
- Scan the bulk import candidates using a volume or folder scan tool such as JDiskReport to get information on the (hidden) file types and their sizes.
- Use the Analyze option in the Bulk Import tool and compare the result.
- When needed, clean up and / or re-structure the assets to make sure that they can be imported as expected.
Recovering from a failed import
When an import fails for some reason, do the following:
Step 1. Check the log files for indications of what went wrong and resolve the issue.
Step 2. Run the bulk import again. When using the Bulk Import tool, make sure that the option 'Skip assets Import that are already imported' is enabled.
Comment
Do you have corrections or additional information about this article? Leave a comment! Do you have a question about what is described in this article? Please contact Support.
0 comments
Please sign in to leave a comment.