NetFlow© Agent Configuration and Troubleshooting Guide
This tutorial provides a step by step guide to adding a new NetFlow/IPFIX agent to Traffic Sentinel, with details on problems that you may encounter.
Adding a new NetFlow/IPFIX Agent
Recommended Steps - Router Side
- Configure NetFlow/IPFIX on the device. This task can be complex, often running to several pages of CLI commands. So allow time to learn the characteristics of each make+model+firmware revision. Start with the simplest possible configuration that supplies these minimal keys and values. You can add more fields later, but first it is better to follow this through to the server:
- Flow Start Time
- Flow End Time
- ingress physical ifIndex
- egress physical ifIndex
- IP Source
- IP Destination
- IP Protocol
- L4 Source Port
- L4 Destination Port
- Apply this to ingress traffic on all ports so that everything that traverses the router is counted once.
- By default Traffic Sentinel will be listening on UDP ports 9985 (NetFlow) and 4739 (IPFIX), so configure the router to send to one of those.
- Confirm via "show netflow" or equivalent that the router is sending UDP datagrams to the Traffic Sentinel server IP.
Recommended Steps - Server Side
- On the Traffic Sentinel server, confirm that the packets are arriving with:
sudo /usr/sbin/tcpdump -i any udp port 9985 or udp port 4739
- Confirm that the software firewall is allowing packets through on 9985/udp and/or 4739/udp.
sudo iptables --list
- In the Traffic Sentinel web UI, confirm that a new agent event appears under Events>List with type=Configuration.
- After a minute has passed, confirm that the new agent now appears under Traffic>Status in the right zone+group.
- Confirm that SNMP is working too (File>Configure>Status>select agent>Test SNMP).
- Confirm the "NetFlow engines" and "NetFlow templates" output from the agent debug page (File>Configure>Status>select agent>Test SNMP). In the case of NetFlow versions 1,5 and 7 there will be no templates, but for versions 9 or 10 (IPFIX) you should see templates that match the configuration you entered on the router, sorted with the most frequently used template first.
- Confirm calibration by comparing the frames/sec counter trends (Traffic>Trend>select agent>show=Frames per Second) with the frames/sec top talker trends (Traffic>TopN>select agent>units=Frames/sec.). NetFlow typically only counts layer 3 packets and the hardware counters will count all packets, however in most cases the envelope of two graphs should approximately match.
- Consider adjusting the packet sampling settings on the router, if applicable. If the router is sending 1:1 flow records (i.e. no packet sampling) then Traffic Sentinel will retrofit a packet sampling algorithm to the feed. So consider adjusting that retrofitted sampling rate. This is controlled under File>Configure>Edit Sampling Settings.
- Now go back and explore which additional fields might usefully be added to the template. Suggestions are:
- IP TOS
- IP TTL
- Ingress VLAN
- Ingress MAC source address
- Ingress MAC destination address
- Equivalent template for IPv6
- Finally, check the list of common problems below to see if any further correction is required.
- agent address: For NetFlow feeds the source IP of the datagrams is always adopted as the agent address in Traffic Sentinel. Sometimes the agent address is set to an unexpected value and the agent appears in the wrong zone+group. A zone called "other" will appear if the agent address does not match any CIDR, agent-range or agent. This is an error condition because parameters for talking to the device will be undefined. It is recommended to set the source address to be a loopback address that Traffic Sentinel can always talk back to.
- Double Sampling:[warning flag RetrofitSampling] if the router is applying packet sampling but not communicating those details as part of the feed, then Traffic Sentinel will assume that the feed is 1:1 (no sampling) and retrofit the sampling rate configured under File>Configure>Edit Sampling Settings. This double sampling usually makes the traffic all but dissappear, so the agent may not even appear under Traffic>Status. The preferred solution is to adjust the router config so that the sampling information is included (e.g. in an options template). As a fallback you can tell Traffic Sentinel that the feed is preSampled under File>Configure>Edit Sampling Settings, however this approach is brittle: any change to the sampling rate at either end will result in a corresponding undercount or overcount. This problem is particularly common with Cisco routers and the solution is usually to add this to the flow template:
collect flow sampler idand to add this to the flow exporter:
option sampler-table timeout 60
- Active Timeout: [warning flag CacheTimeout] The flow cache on the router should be configured to report regularly on long running flows, so that you do not find out about a flow, say, 30 minutes after it started. This flow cache active timeout setting should be set to less than 5 minutes. Preferably 1 minute. Traffic Sentinel will set a warning flag if it detects a long active timeout. All traffic older than 5 minutes may be collected into the same minute bin, so this error can result in large distortions in the per-minute trend charts.
- Missing Keys: [warning flag MissingKeys] Traffic Sentinel will reconcile and deduplicate traffic flows that cross multiple observation points across the network. To do this the flow records must at least contain certain fields, such as source and destination addresses. Traffic Sentinel will set a warning flag if it detects flow records that do not have this minimal set of keys.
- Missing Times: [warning flag MissingTimes] Traffic Sentinel will partition flow records into per minute bins depending on their start and end times. To do this the flow records must at least contain flow start and flow end timestamps. Traffic Sentinel will set a warning flag if it detects flow records that do not have these fields.
- Missing In/Out Ports: [warning flag MissingInOut] If the NetFlow feed does not fill in the in/out port numbers (they may be zero even if you include them in the template) then Traffic Sentinel will be unable to separate traffic out by link. The traffic on a link will appear to be missing, even though it is present when you scope the query to the whole agent. You can detect this happening using a Traffic>TopN>Custom chart with keys "I/F in" and "I/F out".
- Missing Values: [warning flag PktCtr] Traffic Sentinel expects to track both packet count and byte count quatites. If a flow record appears with only a byte count field then Traffic Sentinel will estimate the frame count by applying a typical estimate of the average packet size. This may distort the packet count estimates for the whole network.
- IP:0 Traffic: If you see traffic with protocol (serverport) reported as IP:0 you should check to ensure that the IP Protocol field is being reported in the template. Without this field Traffic Sentinel may not be able to distinguish say, TCP:53 and UDP:53 traffic.
- VMWare VDS: When configuring NetFlow export on the VMWare VDS, do not check the box that models the whole system as one large switch. It is better for Traffic Sentinel if each virtual switch on each hypervisor sends using its own IP address and therefore appears in the topology as a separate device and observation point.
- Layer2 NetFlow: When configuring NetFlow to monitor traffic that is being switched at layer2, you may find that the only recond that can be reported contains layer2 keys only. Such as:
- Ingress ifIndex
- Egress ifIndex
- Ethernet Protocol
- Source MAC address
- Destination MAC address
- Martians: If the packets arrive at the server on the wrong interface, i.e. not the one used to talk back to the router:
sudo ip route get <agentIP>then they will be dropped by the OS as "Martians". You may have to force a different source IP or collector IP using the router CLI.
The agent status will be yellow (warning) if any of these conditions are detected. If you need to mask any of these warning flags you can set agent.warning.mask in global.prefs (File>Configure>Edit>Files>global.prefs). For example, to mask the CacheTimeout, RetrofitSampling and SkewedTimes warnings, you can set:
agent.warning.mask = "CacheTimeout|RetrofitSampling|SkewedTimes"
- Flow record active cache timeout too long (CacheTimeout)
If the active flow cache timeout setting on the router appears to be more than 5 minutes.
- Missing flow record keys (MissingKeys)
If a flow template is missing source and/or destination address keys.
- Missing flow record in/out ports (MissingInOut)
If a flow template is missing input and output interface ifIndex numbers.
- Missing flow record times (MissingTimes)
If a flow template is missing flow start/end timestamps.
- Missing flow record packet count (PktCtr)
If a flow template is missing the packet counter.
- Bad flow record values (BadValues)
If a flow template is missing the bytes counter or the bytes-per-packet ratio is impossible.
- Unsampled flow records (RetrofitSampling)
If the agent appears to be sending 1:1 flow records (no packet sampling) so that Traffic Sentinel is applying 1:N sampling retrospectively. This may be intentional, or it may be necessary to configure the router to indicate the sampling rate in the NetFlow feed.
- Skewed flow record times (SkewedTimes)
If there appears to be excessive clock drift between the agent timestamps and the system clock on the Traffic Sentinel server. The solution may be to resync the clock on one or the other.