Sentinel Performance Tuning
Tips for tuning the performance of Traffic Sentinel on your Linux server.
It is recommended that hsflowd be installed on the server and configured to send sFlow to 127.0.0.1. This makes it easier to see the effect of any tuning changes you make.
In most cases the biggest improvement is seen by switching to SSD (flash) disk storage, so please consider that as a step to be taken at some point in the timeline, even if it is not an option today.
The performance of Traffic Sentinel can be divided into two separate categories:
- The front-end - receiving, decoding, enriching, indexing and compressing data to disk.
- The back-end - making queries and sharing dashboards.
To tune the front-end performance, consider these options:
- Allow large input UDP socket buffereing (e.g. sysctl net.core.rmem_max=26214400)
- Configure global.prefs so the inxsazd process will ask for more socket buffering:
SFlowSocketBufferSize = 16515072 NetFlowSocketBufferSize = 16515072
- Configure global.prefs to shard inxsazd process into 4 worker threads:
SATHREADS = 4
- If the inxpoll process is consuming more than 50% CPU then configure global.prefs to shard inxpoll SNMP polling into 4 worker processes:
POLLERS = 4
- Tune the NIC driver (ring-buffers, interrupt-coallescing,...). On a VM in particular the choice of driver can make a big difference.
- If the sFlow/NetFlow input datagrams/sec rate is very high (e.g. in excess of 100K packets/sec) and Sentinel is configured to forward to many other receivers so that the socket reading and forwarding thread is a bottleneck, then configure global.prefs to dedicate 2 threads to that task:
SAREADERS = 2
To tune the back-end query performance, consider these options:
- If your server has more than 4 CPU cores, configure global.prefs to use more worker threads per query (default is 4):
QueryReaderThreads = 16
- If there is plenty of disk-space, configure global.prefs to increase query cache and placement cache sizes:
QueryCacheMaxBytes = 4G PlacementCacheMaxBytes = 4G
- Provided you are not monitoring a VMWare VDS virtual switching system with the NetFlow export it offers, you can use this optimization to accelerate per-agent or per-link queries:
query.opt.samplingDirection = YES
- Add more RAM. Linux uses any spare RAM as disk-cache, so with enough RAM in the server most queries will run entirely from disk-cache. This eliminates the I/O bottleneck and allows all query worker threads to run at 100%. For example with QueryReaderThreads=16 you can see inxQuery processes consuming close to 1600% CPU, and completing correspondingly faster.
- Deploy a NUMA server with a high core-count. On NUMA servers with more cores you can set QueryReaderThreads even higher and still keep all threads busy. This is because the query engine detects NUMA hardware and causes blocks of data to be processed on the NUMA node that has them cached locally.
- Deploy SSD (flash) disks. Although the Traffic Sentinel database is architected to combine reads and benefit from read-ahead, it is inevitable that some seeking is required to get to the next field or time-period. This seeking is typically the biggest bottleneck for queries, and shows up as CPU-WAIT time in the hsflowd graphs under Hosts>Trend. An SSD disk has zero seek time, so all this time is recovered. Thus a query to an SSD disk can keep more query threads busy, though not as many as when the data is found in the disk cache. The script /usr/local/inmsf/scripts/history_archive can be deployed under cron(1) to move older data to secondary storage while still keeping it visible to the Sentinel query engine.
- Long term scheduled reports such as those with interval=last30Days,group=day can usefully be scheduled to update every night. When a trend report updates every night most of the previous time-interval results are found in the query-cache. So only the most recent 24 hours of new data have to be processed, and those are often found in disk-cache.
- Exposing commonly-used charts in scripted dashboards (under File>REST) allows them to be shared efficiently among multiple users. If two users look at the same dashboard the second will receive the page from the web-server cache. So the server only needs to compute the dashboard once, and the load for 100 users is the same as the load for 1.
- If you have significantly more RAM than you need, slow disk I/O, and a large number of devices, then you may see an acceleration if you use a tmpfs ramdisk to hold the more transient state and data files. Note that with this option you will lose a few extra minutes of data in the event of a power failure or reboot:
VCacheMB = 500 VCache.state=YES VCache.nfp=YES VCache.active=YES VCache.mib=YES KeepActiveMinutes=30These options require a full restart of the daemon to take effect (e.g. "systemctl restart inmsfd"). They assume a regular tmpfs partition under /dev/shm. For details see /usr/local/inmsf/scripts/vcache.