Advanced Scripting

This tutorial describes advanced scripting capabilities that greatly increase the scalability and speed of queries made using Traffic Sentinel's scripting mechanism. Before proceeding with this tutorial it is worth studying the Scripting Queries tutorial since it introduces the scripting capabilities of Traffic Sentinel.

Note: This tutorial describes query functions first(), last(), count(), min(), max(), sum(), mean(), sdev(), rate() as well as features that were added in Traffic Sentinel 4.0 (Multi-Query and Result Streaming) and 6.0 (Mask, Domain, Prefix, Suffix and Extract). If you are running an older version of Traffic Sentinel you will need to upgrade before you can use these features.

Query Functions

Query functions are used to modify key or value attributes in the select string of a query. For example, the following query counts the number of destination addresses associated with each source address:

var q = Query.topN("traffic",
                   "sourceaddress,count(destinationaddress)",
                   null,
                   "last24hours",
                   "count(destinationaddress)",
                   5);
var t = q.run();
t.printCSV(false);

Running the query gives the following results:

10.0.0.114,1029
10.0.0.150,119
10.0.0.1,97
10.0.0.72,53
10.0.0.62,32
,161

The following functions are defined:

  • first() applies to a key column, instead of creating a new result row for each unique value, rows are combined and only the first value is reported.
  • last() applies to a key column, instead of creating a new result row for each unique value, rows are combined and only the last value is reported.
  • count() applies to a key column, instead of creating a new result row for each unique value, rows are combined and the count of unque values is reported.
  • min() applies to a value column. The value is calculated for each minute in the query interval and the minium value is reported.
  • max() applied to a value column. The value is calculated for each minute in the query interval and the maximum value is reported.
  • sum() applies to two or more value columns. The values for each record are summed and the total is returned, for example "sum(ifinerrors ifouterrors)" will return the total errors.
  • mean() applies to a value column. The value is calculated for each minute in the query interval and the average value is reported.
  • sdev() applies to a value column. The value is calculated for each minute in the query interval and the standard deviation is reported.
  • rate() applies to a value column. The total value is divided by the number of seconds in the interval and the rate is reported.

Mask

The mask() function is a way to flexibly roll up IP addresses into groups based on a list of CIDR masks, applying the narrowest matching mask to each address.

Masks have two forms:

  • <address>/<mask to match>/<mask to apply>, uses the first mask to match an address and the second to group addresses that match.
  • <address>/<mask to match>=<name>, uses the mask to match an address and then assigns the name to the group of addresses that match.

For example, the following query applies a mask to ipsource addresses, applying the label, servers to addresses in the CIDR 10.1.4.0/24 and grouping addresses in the 10.0.0.0/8 CIDR into /24's:

var q = Query.topN("traffic",
                   "mask(ipsource $mask),bytes",
                   null,
                   "yesterday",
                   "bytes",
                   20);
q.$mask = "10.0.0.0/8/24,10.1.4.0/24=servers";
var t = q.run();
t.printCSV(false);

Running the query produces the following results:

10.0.0.0/24,69074526167
10.1.1.0/24,74124814
servers,48776185
10.1.2.0/24,9160802

Domain

The domain() function is used to truncate domain names. A postive integer extracts tokens from the right of the domain name and a negative number extracts tokens from the left of the domain name.

To illustrate, consider the following query:

var query = Query.topN("traffic",
                       "sourcename,frames",
                       null,
                       "last60minutes",
                       "frames",
                       5);
var t = q.run();
t.printCSV(false);

Running the query produces the following results:

pz-in-f109.1e100.net,241061
pw-in-f108.1e100.net,108134
google-public-dns-a.google.com,650791
iy-in-f109.1e100.net,66317
174.35.40.12,29351
,503327

The following query uses the domain function to extract the right most token from the domain names:

var query = Query.topN("traffic",
                       "domain(sourcename 1),frames",
                       null,
                       "last60minutes",
                       "frames",
                       5);
var t = q.run();
t.printCSV(false);

Running the query gives the following results:

net,447401
com,527988
12,29351
edu,25912
br,7886
,45588

The next example uses the domain function to extract the left most token, i.e. the hostname:

var query = Query.topN("traffic",
                       "domain(sourcename -1),frames",
                       null,
                       "last60minutes",
                       "frames",
                       5);
var t = q.run();
t.printCSV(false);

Running the query gives the following results:

pz-in-f109,241061
pw-in-f108,77259
google-public-dns-a,630337
iy-in-f109,66317
174,32806
,491597

Prefix/Suffix/Extract

The prefix(), suffix() and extract() functions are used to extract information from string attributes, for example URLs, Memcache keys, etc.

The following example shows a query to find out the most frequently accessed keys in a memcache cluster:

var query = Query.topN("application",
                       "memcachekey,op_count",
                       null,
                       "last60minutes",
                       "op_count",
                       5);
var t = q.run();
t.printCSV(false);

Running the query produces the following results:

brutis-236305,30
brutis-873101,20
brutis-934439,20
brutis-351412,20
brutis-281879,20
,2623316

In this case, none of the individual keys appear in a significant fraction of the Memcache operations. However, the following query uses the prefix function to extract the first token using "-" as a delimiter:

var query = Query.topN("application",
                       "prefix(memcachekey $delim 1),op_count",
                       null,
                       "last60minutes",
                       "op_count",
                       5);
query.$delim = '-';
var t = q.run();
t.printCSV(false);

Running the query produces the following results, clearly demonstrating that all the keys being accessed share the same "brutis" prefix:

brutis,2623051

The next query uses the suffix function to remove the shared prefix and focus on the numeric part of the key:

var query = Query.topN("application",
                       "suffix(memcachekey $delim 1),op_count",
                       null,
                       "last60minutes",
                       "op_count",
                       5);
query.$delim = '-';
var t = q.run();
t.printCSV(false);

Running the query gives the following results:

353893,40
265737,40
603423,40
186665,40
765830,40
,2622886

The next query uses the extract function to pull out the first and second numeric digits after the dash in the key, specified using the regular expression "-([0-9])([0-9])". The two tokens are then combined using ":" as a separator:

var query = Query.topN("application",
                       "extract(memcachekey $pat $sep),op_count",
                       null,
                       "last60minutes",
                       "op_count",
                       5);
query.$pat = '-([0-9])([0-9])';
query.$sep = ':';
var t = q.run();
t.printCSV(false);

Running the query gives the following result:

3:6,126710
3:7,126400
3:8,47240
3:5,46010
3:9,27560
,2121483

Multi-Query

The multi-query mechanism is a way to run multiple queries in parallel. Since disk access is typically the bottleneck on a multicore server; all the queries in a multi-query typically complete in almost the same time that it would take to complete just one of the queries.

For example, the following script takes an array of IP addresses and finds the top 5 servers and top 5 services accessed by each address. The script requires two queries for each address, for a total of 6 queries.

var hosts    = ["10.0.0.50","10.0.0.51","10.0.0.52"];
var interval = "today";
var n        = 5;

var report = Report.current();
var q, filter, t;
for each (var host in hosts) {
  report.heading("Client: " + host);
  filter = "ipclient = " + host;
  q = Query.topN("traffic",
                 "serveraddress,bytes",
                 filter,
                 interval,
                 "bytes",
                 n);
  t = q.run();
  report.table(t);

  q = Query.topN("traffic",
                 "serverport,bytes",
                 filter,
                 interval,
                 "bytes",
                 n);
  t = q.run();
  report.table(t);
}

The following script combines the six queries into a single multi-query; giving a 6 times improvement in execution speed.

var hosts    = ["10.0.0.50","10.0.0.51","10.0.0.52"];
var interval = "lastweek";
var n        = 5;

var selects  = [];
var filters  = [];

var q = Query.topN("traffic",
                   selects,
                   filters,
                   interval,
                   "bytes",
                   n);
q.multiquery = true;

for each (var host in hosts) {
  var filter = "ipclient = " + host;

  selects.push("serveraddress,bytes");
  filters.push(filter);

  selects.push("serverport,bytes");
  filters.push(filter);
}

var tables = q.run();

var report = Report.current();
var i = 0;
for each (var host in hosts) {
  report.heading("Client: " + host);

  report.table(tables[i++]);
  report.table(tables[i++]);
}

A multi-query is specified by using arrays for one or more of the query properties. In addition the property multiquery must be set to true. When a multi-query is run, it returns an array of results corresponding to the array(s) of parameter values. The only query parameters that cannot be specified as arrays are view, interval and multiquery.

Result Streaming

Typically when you run a query you get a table of results. If the query generates a large table, the table will take up a lot of memory on the server. Result streaming is a way to process the query results row by row as they are generated, rather than requiring the entire table to be stored.

For example, suppose you wanted a query to return all IP addresses that sent more than 100MBytes of traffic. You don't know how many addresses exceed the threshold, so you can't set the query truncate parameter. Instead, you need to ask for traffic totals for every address and apply the threshold to identify the addresses you are interested in. The following script shows how result streaming can be used to avoid generating the large table of results.

var interval = "today";
var threshold = 100000000;

var q = Query.topN("traffic",
                 "ipsource,bytes",
                 null,
                 interval,
                 "bytes",
                 100000);

var t = q.run(
  function(row,table) {
    if(row[0] && row[1] >= threshold) table.addRow(row);
  }
             );
t.printCSV(true);

Result streaming is invoked when a function is provided as an argument to the Query run() method. The function is then applied to each row of data. In this example, the function applies the threshold and only adds rows to result table if the byte count exceeds the threshold.

Note The truncate value was set to 100000. A truncate value of -1 would have allowed any number of results to be returned, however, it is strongly recommended that a truncate value be set, even if it is very large, since it establishes a limit on the amount of memory that the query will need to use to store intermediate results.

Result streaming can be applied to a multi-query. The following example calculates the total bytes sent or received by each address and applies the threshold to the total.

var interval = "today";
var threshold = 100000000;

var q = Query.topN("traffic",
                 ["ipsource,bytes","ipdestination,bytes"],
                 null,
                 interval,
                 "bytes",
                 100000);
q.multiquery = true;

var totals = {};
var t = q.run(
  function(row,table) {
    if(row[0]) {
      if(totals[row[0]]) totals[row[0]] += row[1];
      else totals[row[0]] = row[1];
    }
  }
             );

var result = Table.create(["Address","Bytes"],["address","double"]);
for (var addr in totals) {
   if(totals[addr] >= threshold) result.addRow([addr,totals[addr]]);
}
result.sort(1);
result.printCSV(true);

Finally, when using a multi-query, you may provide an array of functions as an argument to the Query run() command in order to apply a different function to each query.

Related Topics