Confused Bird Forum

Full Version: MikroTik script to log ping spikes, packet loss and internet outages
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a MikroTik Chateau LTE12 router dedicated to running our VoIP based landline and an Internet radio.  We have this radio on throughout the day and doubles as an an outage indicator.  We get the occasional glitch during phone calls, so to determine whether it's due to a latency spike or packet loss, I created a script to monitor the connection. 

This script continuously pings the host variable declared at the start of the script every second, in this case Google's DNS 8.8.8.8.  It counts ping latency spikes above the declared threshold (300ms) as well as the number of packets lost.  If there are more than 3 packets lost in a row, it logs the time of the connection going down and once it gets a ping response again, it logs the time the connection went up.  Every 10 minute interval, it logs the number of latency spikes (if any) including the maximum ping response time and if the packet loss percentage if above about 1%.  

The following shows an example of it logging an Internet outage, packet loss and a few latency spikes.  In this example, I disabled and re-enabled the LTE connection to simulate the outage and packet loss and ran a few speed tests to introduce some latency spikes:

[attachment=921]

The following is the script:
Code:
# Configure maximum response as a spike, the ping host and min packets lost per 10 minute interval to log
:local maxping 300
:local pinghost "8.8.8.8"
:local maxlost 5

# Get current time stamp
:local ctime
if ([:len [timestamp]] = 23) do={:set ctime [:pick [timestamp] 5 13]} else={:set ctime [:pick [timestamp] 7 15]}

# Internal variables to script
:local minute [:pick [$ctime] 3]
:local interval [:pick [$ctime] 0 5]
:local pingresult
:local pingtime
:local maxrtt 0
:local maxrtttime
:local rttspikes 0
:local downtime
:local pingfailseq 0
:local packetcount 0
:local packetslost 0
:local pings

:do {
  # Ping the host and get response time
  :set pingresult ([ping $pinghost count=1 as-value]->"time")
  :set packetcount (packetcount+1)

  # If the ping is a success
  if ($pingresult!=nul) do={
    :set pingtime [:tonum ([:pick $pingresult 7] . [:pick $pingresult 9 12])]

    # If the response time exceeds the max threshold, log and count the spike
    if (pingtime>maxping) do={
      if (pingtime>maxrtt) do={
        :set maxrtt (pingtime)
        :set maxrtttime [:pick $ctime 0 5]
      }
      :set rttspikes (rttspikes+1)
    }

    # If the connection was down, log the up time
    if (pingfailseq>2) do={
      :log info [("Internet up at $ctime")]
    }
    :set pingfailseq 0

  # If there are 3+ ping failures in a row, log the connection down time
  } else={
    if (pingfailseq=0) do={:set downtime $ctime}
    :set pingfailseq (pingfailseq+1)
    if (pingfailseq=3) do={:log info "Internet down at $downtime"}
    :set packetslost (packetslost+1)
  }

  # Wait a second and get current time stamp
  delay 1
  if ([:len [timestamp]] = 23) do={:set ctime [:pick [timestamp] 5 13]} else={:set ctime [:pick [timestamp] 7 15]}
  
  # Every 10 minutes, log the number of ping spikes (if any) and the maximum ping spike time
  if (minute!=[:pick $ctime 3]) do={
    if (rttspikes>0) do={
      if (rttspikes>1) do={:set pings "pings"} else={:set pings "ping"}
      :log info "Interval $interval - $rttspikes $pings above $maxping ms, highest ping of $maxrtt ms at $maxrtttime"
    }

    # If the packet loss is above the set threshold, log the packet loss percentage
    if (packetslost>=maxlost) do={
      :local packetloss (packetslost * 1000 / packetcount)
      :log info ("Interval $interval - " . $packetloss / 10 . "." . $packetloss % 10 . "% packet loss")
    }

    # Reset counts for next 10 minute interval
    :set rttspikes 0
    :set maxrtt 0
    :set minute [:pick $ctime 3]
    :set interval [:pick $ctime 0 5]
    :set packetslost 0
    :set packetcount 0
  }
} while=(true);

To set up the script in Winbox, go into System -> Scripts -> Scripts tab.  Click the '+' icon, give it a name and paste the script in the white box below, then click 'OK'. 

You can adjust the three variables at the start of the script:

maxping - Count all latency responses as spikes above this threshold in milliseconds.
pinghost - The host to monitor.  This can set to another pingable host to monitor.
maxlost - Logs the packet loss above this packets lost threshold.  This figure is approximately 5 packets per 1%. 

When the script is run, it continues running in the background.  To stop the script, go into the Jobs tab, click the script name and click '-' to stop it.  To have it automatically run on Startup, create a scheduler (System -> Scheduler -> '+' icon), set the Start Time dropdown to "startup", enter the script name in the "On Event:" field and click OK.
I noticed that this script was occasionally recording a malformed timestamp which took me a while to debug. It turned out to be the way MikroTik stores the day of the week value in its timestamp, which is a value between "1d" (Friday) and "6d" (Wednesday).

On Thursday it omits the day of week value in the timestamp string, resulting in the timestamp string being 2 characters shorter than every other day of the week. The resulted in the ":pick" command picking the characters 2 positions out on a Thursday, resulting in the malformed log reports and it not recording the ping spikes or packet loss in 10 minute intervals that day.

As a workaround, I added an if statement that checks the length of the timestamp. It picks out the HH:MM:SS portion from the changed position if the length is 23 digits, i.e. Thursday.

I updated the post above with the revised script code that should now work properly on Thursdays.
Hi sean
thanks so much for sharing the knowdge 
the script does not work with mikrotik V6 
do you know why ?
confirm ,please
@
MikroTik made some changes changes to scripting and command output in RouterOS v7 which the above code was written in. Unfortunately I don't have any MikroTik hardware running RouterOS v6 to try debugging or converting the code to run on v6 as MikroTik's LTE / 5G based hardware which I mainly use require RouterOS v7.

As MikroTik's scripting documentation states to start each global command with a ':' prefix, you can prefixing global commands in the code in case RouterOS v6 required the prefix, e.g. change 'if' to ':if', 'delay' to ':delay', etc.
thanks so much for your repling

I just add a fetch command to send packet loss to telgram but the script does not execut it ,the log msg is
(Download from api.telegram.org FAILED: Fetch failed with status 400)

can you give me a hand ,please


# If the packet loss is above the set threshold, log the packet loss percentage
if (packetslost>=maxlost) do={
:local packetloss (packetslost * 1000 / packetcount)
:local server [/system identity get name]
:log info ("Interval $interval - " . $packetloss / 10 . "." . $packetloss % 10 . "% packet loss") ;

/tool fetch url=("https://api.telegram.org/botxxxxxxxxxxx:AAHa34oakKaRXc1B7_B3HG2-GlsDBZoB6nE/sendMessage?chat_id=xxxxxxxxxxxxxxx&text= $server $packetloss % 10 . % packet loss") keep-result=no;

}
I am not sure whether the fetch tool automatically escapes special characters as a URL cannot contain spaces, hyphens, etc. Likewise the the '?' needs to be escaped and the math cannot be in the string.

You can try the following. I don't use Telegram or have RouterOS V6 to test.

Code:
# If the packet loss is above the set threshold, log the packet loss percentage
if (packetslost>=maxlost) do={
:local packetloss (packetslost * 1000 / packetcount)
:local server [/system identity get name]
:local plmessage ("$server%20Interval%20$interval%20%2D%20" . $packetloss / 10 . "." . $packetloss % 10 . "%25%20packet%20loss") ;
:log info ("Interval $interval - " . $packetloss / 10 . "." . $packetloss % 10 . "% packet loss") ;

/tool fetch url=("https://api.telegram.org/botxxxxxxxxxxx:AAHa34oakKaRXc1B7_B3HG2-GlsDBZoB6nE/sendMessage\?chat_id=xxxxxxxxxxxxxxx&text=$plmessage") keep-result=no;

}
thanks so much for your feedback Mr.sean
allah plesses you