About Keyword Lists Parent topic

Keywords are special words or phrases. You can add related keywords to a keyword list to identify specific types of data. For example, "prognosis," "blood type," "vaccination," and "physician" are keywords that may appear in a medical certificate. If you want to prevent the transmission of medical certificate files, you can use these keywords in a DLP policy and then configure IWSVA to block files containing these keywords.
Commonly used words can be combined to form meaningful keywords. For example, "end," "read," "if," and "at" can be combined to form keywords found in source codes, such as "END-IF," "END-READ," and "AT END."

Predefined Keyword Lists

IWSVA comes with a set of predefined keyword lists. These keyword lists cannot be modified or deleted. Each list has its own built-in conditions that determine if the template should trigger a policy violation.

Customized Keyword Lists

Create customized keyword lists if none of the predefined keyword lists meet your requirements.
There are several criteria that you can choose from when configuring a keyword list. A keyword list must satisfy your chosen criteria before IWSVA subjects it to a DLP policy. Choose one of the following criteria for each keyword list:
  • Any keyword
  • All keywords
  • All keywords within <x> characters
  • Combined score for keywords exceeds threshold
The following table lists the criteria options for keyword lists:
Criteria for Keyword Lists
Criteria Rule
Any keyword A file must contain at least one keyword in the keyword list.
Any keywords A file must contain all the keywords in the keyword list.
All keywords within <x> characters A file must contain all the keywords in the keyword list. In addition, each keyword pair must be within <x> characters of each other.
For example, your 3 keywords are WEB, DISK, and USB and the number of characters you specified is 20.
If IWSVA detects all keywords in the order DISK, WEB, and USB, the number of characters from the "D" (in DISK) to the "W" (in WEB) and from the "W" to the "U" (in USB) must be 20 characters or less.
The following data matches the criteria:
DISK####WEB############USB
The following data does not match the criteria:
DISK*******************WEB****USB (23 characters between "D" and "W")
When deciding on the number of characters, remember that a small number, such as 10, will usually result in faster scanning time but will only cover a relatively small area. This may reduce the likelihood of detecting sensitive data, especially in large files. As the number increases, the area covered also increases but scanning time might be slower.
Combined score for keywords exceeds threshold
A file must contain one or more keywords in the keyword list. If only one keyword was detected, its score must be higher than the threshold. If there are several keywords, their combined score must be higher than the threshold.
Assign each keyword a score of 1 to 10. A highly confidential word or phrase, such as "salary increase" for the Human Resources department, should have a relatively high score. Words or phrases that, by themselves, do not carry much weight can have lower scores.
Consider the scores that you assigned to the keywords when configuring the threshold. For example, if you have five keywords and three of those keywords are high priority, the threshold can be equal to or lower than the combined score of the three high priority keywords. This means that the detection of these three keywords is enough to treat the file as sensitive.