Searching with a Discovery Team

A Discovery Team is composed of managed endpoints that have been grouped together for the purpose of automatically assigning the work of a search. A Discovery Team can be used in a many-to-many scenario where a designated group of endpoints is tasked with searching managed and unmanaged endpoints and remote locations. A team can also be used in a many-to-one scenario where a group of endpoints can divide the work of searching a large data store. Depending on the configuration of the settings, team members can work individually or cooperatively, as detailed below. Please refer to the Discovery Teams topic in this user guide for the creation and configuration of Discovery Teams.

Discovery Team searches are available when using Scheduled Task Policies. To configure a search using a Discovery Team, select the Policies tab, then select a Scheduled Task Policy from the Policy list and click on Scheduled Tasks. Add a new scheduled task or edit an existing one, select Search using this Discovery Team and select a Discovery Team from the drop-down.

Clicking the Perform distributed searching with load balancing Manage button opens the Discovery Team Search Settings dialog:

There are three options available for searching with a discovery team, Perform distributed searching using all available team members, Load balancing, Override default minimum (2 GB) and maximum (20000 GB) Loads, Override default minimum (20000) and maximum (100000) number of items and Reassign load if Team Member is unresponsive for x minutes.

Depending on the configuration of the Perform distributed searching using all available team members and Load balancing settings and whether there is a single target or multiple targets to search, the search performs differently, as described below.

Single target
Multiple targets

Single target

When searching a single target, Discovery Teams provide the ability to distribute the load across multiple team members and therefore significantly reduce the total time needed to search the target.

When a single target is specified and Search using this Discovery Team is selected:

When neither Perform distributed searching using all available team members nor Load Balancing are selected, the first available team member is used to perform the entire search of the target.
When Perform distributed searching using all available team members is enabled and Load Balancing is disabled:
- An analysis of the target is run to determine the total amount of data to be searched. Once the analysis is complete, the total data from the target is calculated and split into data sets utilizing the value of the Maximum load when using distributed searching setting.
  - If the total amount of data does not exceed the value in Maximum load when using distributed searching, then the entire search is performed by the first available team member.
  - If there is only a single folder and it is greater than the value of the Maximum load when using distributed searching setting, it is assigned to the first available team member.
  - If the amount of data across multiple folders exceeds the value in Maximum load when using distributed searching, the total data to be searched is split into data sets with each data set containing no more data than the value of that setting. Each available team member is then assigned a data set to search. This is repeated until there are no more data sets to be searched. If there are more data sets to be searched but there are no more team members available, the remaining data sets are assigned and searched when a team member becomes available. The searches may be run concurrently.
    For example:
    - Max load is set to 1,024GB.
    - There is one target containing 2,248GB of data across multiple folders.
    - Four team members available to perform a search.
      - Team member 1 is assigned 1,024GB of data to search.
      - Team member 2 is assigned 1,024GB of data to search.
      - Team member 3 is assigned 200GB of data to search.
      - Team member 4 is unused.

When both Perform distributed searching using all available team members and Load balancing are enabled:

An analysis of the target is run to determine the total amount of data to be searched. Once the analysis is complete, the total amount of data from the target is calculated and divided into data sets according to the values specified in the Minimum load when using load balancing and Maximum load when using distributed searching settings. Each team member is then assigned a data set to search. The searches may be run concurrently.

If there are data sets that are smaller than the value in the Minimum load when using load balancing setting, those data sets are grouped together until they reach the value in the Minimum load when using load balancing setting and are then assigned to team members until all data sets are assigned. If there are more data sets to be searched but there are no more team members available, the remaining data sets are assigned and searched when a team member becomes available.

If there is a data set that is smaller than the value in the Minimum load when using load balancing setting that cannot be grouped or is greater than the value of the Maximum load when using distributed searching setting that cannot be split, it is still assigned to a single available team member to be searched.

For example:

Max load is set to 1,024GB.
Minimum load is set to 10GB.
Three team members are available to perform a search.
There is one target containing a total of 1,800GB of data across multiple folders.
The amount of data in each folder is calculated and split into 3 data sets.
- Data set 1 contains 600GB of data and is assigned to team member 1
- Data set 2 contains 600GB of data and is assigned to team member 2.
- Data set 3 contains 600GB of data and is assigned to team member 3.

Multiple targets

When searching multiple targets, Discovery Teams provide the ability to distribute the load across multiple team members and therefore significantly reduce the total time needed to search each target.

When multiple targets are being searched and Search using this Discovery Team is selected:

When neither Perform distributed searching using all available team members nor Load Balancing are enabled, one target is searched at a time. The first target is searched by the first available team member of the selected discovery team. Once that search is complete the next target is assigned to the next available team member and this process repeats until all targets have been searched.

When Perform distributed searching using all available team members is enabled and Load Balancing is disabled:

An analysis of the targets is run to determine the total amount of data to be searched. Once the analysis is complete, the total data from the targets is calculated and split into data sets utilizing the value of the Maximum load when using distributed searching setting.
If the total amount of data on a target does not exceed the value of Maximum load when using distributed searching, that data set is assigned to a single team member and that team member is not assigned additional data from other targets.
If there is only a single folder on a target and it is greater than the value of Maximum load when using distributed searching, it is assigned to the first available team member.

If the total amount of data on a target exceeds the value of Maximum load when using distributed searching, then the search is split into data sets with each data set containing no more data than the value of that setting (unless a single folder exceeds the specified size). Each data set contains data from one target only and each available team member is assigned a data set to search. The searches may be run concurrently.

For example:

Max load is set to 1,024GB.
Three team members are available to perform a search.
There are three targets.
- Target A has 800GB across multiple folders.
- Target B has 1,400GB across multiple folders.
- Target C has 1,200GB in one folder.
- The amount of data in each folder is calculated and split into 4 data sets.
  - Data set 1 contains 800GB of target A and is assigned to team member 1.
  - Data set 2 contains 1,024GB of target B and is assigned to team member 2.
  - Data set 3 contains 376GB of target B and is assigned to team member 3.
  - Data set 4 contains 1,200GB of target C and is assigned to whichever of team member 1, 2, or 3 that becomes available first.

When both Perform distributed searching using all available team members and Load balancing are enabled:

An analysis of the targets is run to determine the total amount of data to be searched. Once the analysis is complete, the total amount of data from the target is calculated and divided into data sets according to the values specified in the Minimum load when using load balancing and Maximum load when using distributed searching settings. Each team member is then assigned a data set to search.

If there are data sets that are smaller than the value of the Minimum load when using load balancing setting, those data sets are grouped together until they reach the value of the Minimum load when using load balancing setting and are then assigned to team members until all data sets are assigned. If there are more data sets to be searched but there are no more team members available, the remaining data sets are assigned and searched when a team member becomes available.
If there is a data set that is smaller than the value of the Minimum load when using load balancing setting that cannot be grouped or it is greater than the value of the Maximum load when using distributed searching setting that cannot be split, that is considered a data set and assigned to an available team member.

For example:
Max load is set to 1,024GB.
Minimum load is set to 10GB.
Two team members are available to perform a search.
There are three targets:
Target A has 500GB of data across multiple folders.
Target B has 1,024GB of data across multiple folders.
Target C has 1,400GB of data across multiple folders.
The amount of data in each folder is calculated and split into 3 data sets from largest to smallest:
Data set 1 contains 1,024GB of data from target C and is assigned to team member 1.
Data set 2 contains 1,024 GB of data from target B and is assigned to team member 2.
Data set 3 contains 376GB of data from target C + 500GB of target A. If team member 1 completes its search earlier than team member 2, it is assigned to search data set 3 otherwise team member 2 is assigned to search data set 3.

Note: Folders are grouped into data sets by size, not by target, so it is highly likely that each team member has folders from multiple targets and that each target is searched by multiple team members.

The text on the Scheduled Task page reflects the selections you have made in the Discovery Team Search Settings dialog as follows:

When both Perform distributed searching using all available team members and Load balancing are enabled:

When Perform distributed searching using all available team members is enabled and Load balancing is disabled:
When both Perform distributed searching using all available team members and Load balancing are disabled:

The Reassign load if Team Member is unresponsive for x minutes setting determines when to reassign a data set from an unresponsive team member.

When the Console does not receive an update from a Discovery Team member after the specified period of time, the data set that was assigned to that team member is reassigned to the next available team member. The default value is 1,800 minutes. Valid values are 1-999,999.

Note: If the Scheduled Task policy is configured to search using a Discovery Team, specify the target system(s) to search by checking the box next to each desired tag or endpoint on the Endpoints page of the policy. If no endpoints are selected, target path(s) must be specified as Custom Folders or as Remote Machines.

Note: Once the analysis of the targets to be searched has completed, the actual searching of the targets may not begin immediately. The analysis must first be processed and team members must be assigned a data set to search so there could be a short delay before searching begins.

Note: Load balancing is not utilized for the searching of Websites as there is no way to determine the size of each location of those types. What happens is that a single team member is selected to search an entire Website. For example, if you have three websites defined in your policy, each website to be searched is assigned to a single machine and the load is not distributed across multiple team members.

Note: Load balancing can be utilized for the searching of Exchange Servers and Databases providing the Console is at version 10.0.2 or higher and the Endpoints are at version 10.0 or higher. Databases cannot be split if the total data in the Database is smaller than the Minimum load when using load balancing setting on the Applications Setting page. Databases are split by table. Exchange Servers are split by folder.

Note: Search History may not be relied upon for Discovery Team searches due to the fact that it may be a different team member performing the search of a specific data set on subsequent searches. In order for Search History to work properly, the same endpoint must perform the search of the same data on each subsequent search.

Note: Live mode, which checks the previous results for existence on the next search, is forcibly disabled in Spirion version 9 for Discovery Team searches. It is not forcibly disabled in version 10 however, it may not be relied upon for Discovery Team searches due to the fact that it may be a different team member performing the search of a specific data set on subsequent searches. In order for Live mode to work properly, the same endpoint must perform the search of a data set on each subsequent search.

Additional Information

Please refer to the Scheduled Tasks user guide to configure a search using a Discovery Team.