Planet OpenNMS

May 23, 2019

OpenNMS Horizon 24.1.0 Released

Release 24.1.0 is the latest stable release of OpenNMS. It contains a bunch of bug fixes and a few enhancements, including support for OpenTracing in the sink API and a rework of geocoding services.

For a high-level overview of what’s changed in OpenNMS 24, see What’s New in OpenNMS 24.

The codename for 24.1.0 is J.A.R.V.I.S.

  • Thresholds should work without restart when putting nodes into categories (Issue NMS-9811)
  • When editing a surveillance category from Admin flow, lists of nodes are not sorted by node label (Issue NMS-10654)
  • health:check command times out when a health check command gets stuck (Issue NMS-10667)
  • Vaadin bundles stuck intermittently stuck in “Waiting” state (Issue NMS-10668)
  • Table in “Manage Minions” page fails to load (Issue NMS-10670)
  • Default load threshold contains caluculation error (Issue NMS-10671)
  • Can not delete node due to database table constraint (Issue NMS-10674)
  • Missing Indication of Sync Needed for Requisitions (Issue NMS-10675)
  • .rpmnew, .rpmsave, and .dpkg-dist files not erroring out properly (Issue NMS-10676)
  • Some config files should be marked %config (rather than %config(noreplace)) in RPMs (Issue NMS-10677)
  • Memory Leak on Drools while reloading config (Issue NMS-10678)
  • Node detail page renders with no content when invalid node ID specified (Issue NMS-10679)
  • Installing the opennms package installs Oracle JDK 8 instead of OpenJDK 11 on Ubuntu (Issue NMS-10680)
  • Apparent memory leak in JMX collector, possibly restricted to “weird” JMX transports (Issue NMS-10684)
  • Install guide for Java versions is misleading (Issue NMS-10688)
  • Java configuration is ignored on Ubuntu/Debian (Issue NMS-10693)
  • CVE-2018-20433: XXE Vulnerability in c3p0 < (Issue NMS-10694)
  • Memory leak in WS-Man (Issue NMS-10696)
  • EditInRequisitionIT flapping (Issue NMS-10698)
  • Jetty HTTPS selectors can become unresponsive following CancelledKeyException (Issue NMS-10701)
  • Reflected XSS vulnerability in notification/detail.jsp and outage/detail.htm (Issue NMS-10707)
  • Enable extraction of match groups from regex matches in Event.Mask.Varbind.Vbvalue (Issue NMS-10626)
  • Tag “root cause” alarm when providing feedback (Issue HZN-1492)
  • Rework the GeocoderService-Implementations (Issue HZN-1520)
  • Meta-Data Documentation Format Wrong (Issue HZN-1557)
  • Add OpenTracing support for Sink API (Issue HZN-1558)

by RangerRick at May 23, 2019 11:22 PM

OpenNMS Meridian 2018.1.8 Released

Release 2018.1.8 is an update to Meridian 2018.1.7. It contains a few UI fixes and security updates, as well as a fix for memory leaks in Drools config reloading, WS-Man monitoring, and the JMX collector.

The codename for 2018.1.8 is Gale.

  • Memory Leak on Drools while reloading config (Issue NMS-10678)
  • Node detail page renders with no content when invalid node ID specified (Issue NMS-10679)
  • Apparent memory leak in JMX collector, possibly restricted to “weird” JMX transports (Issue NMS-10684)
  • CVE-2018-20433: XXE Vulnerability in c3p0 < (Issue NMS-10694)
  • Memory leak in WS-Man (Issue NMS-10696)
  • Jetty HTTPS selectors can become unresponsive following CancelledKeyException (Issue NMS-10701)
  • Reflected XSS vulnerability in notification/detail.jsp and outage/detail.htm (Issue NMS-10707)

by RangerRick at May 23, 2019 08:11 PM

OpenNMS Meridian 2017.1.17 Released

Release 2017.1.17 is a small update to 2017.1.16 that has a few UI fixes and security updates, as well as a fix for memory leaks in WS-Man monitoring and the JMX collector.

The codename for 2017.1.17 is Old Naval Observatory meridian.

  • Node detail page renders with no content when invalid node ID specified (Issue NMS-10679)
  • Apparent memory leak in JMX collector, possibly restricted to “weird” JMX transports (Issue NMS-10684)
  • CVE-2018-20433: XXE Vulnerability in c3p0 < (Issue NMS-10694)
  • Memory leak in WS-Man (Issue NMS-10696)
  • Reflected XSS vulnerability in notification/detail.jsp and outage/detail.htm (Issue NMS-10707)

by RangerRick at May 23, 2019 08:09 PM

OpenNMS Meridian 2016.1.21 Released

Release 2016.1.21 is a small update to 2016.1.20 that has a few UI fixes and security updates, as well as a fix for a memory leak in WS-Man monitoring.

The codename for 2016.1.21 is North Pole Gnomonic.

  • Node detail page renders with no content when invalid node ID specified (Issue NMS-10679)
  • CVE-2018-20433: XXE Vulnerability in c3p0 < (Issue NMS-10694)
  • Memory leak in WS-Man (Issue NMS-10696)
  • Reflected XSS vulnerability in notification/detail.jsp and outage/detail.htm (Issue NMS-10707)

by RangerRick at May 23, 2019 08:08 PM

May 20, 2019

OpenNMS On the Horizon – May 20th, 2019 – Bug Fixes, CI Workflow, Grafana Integration, Helm, and More!

It’s time for OpenNMS On the Horizon!

In the last week we continued to do bug fixes ahead of the 24.1 Horizon release, did more work supporting flow enhancements and graph service improvements, worked on CI infrastructure, reporting from Grafana, and more!

Github Project Updates

  • Internals, APIs, and Documentation
    • David continued to work on refactoring threshd out of opennms-services.
    • I wrapped up my changes to OpenNMS startup scripts and exit-code handling.
    • Patrick continued his work making the graph service handle edges with vertexes from multiple namespaces.
    • Markus did more work on fixing up an issue that could cause health:check to get stuck.
    • Markus made more changes to the branch porting application topology to the new graph service.
    • Ronny did some work on improving Docker caching in the CircleCI workflow.
    • Chandra fixed some issues in the integration API that could cause service start failures.
    • I released an updated OpenNMS.js with fixed CLI rendering of alarms.
    • Christian worked on adding hostname resolution while processing flows.
    • Chandra worked on fixing a memory leak issue when reloading Drools.
    • Chandra made more improvements to tracing support in the sink API.
    • Jesse fixed a bug that kept nodes from being deleted.
  • Web, ReST, and UI
    • Matt worked on improvements to the flow ReST API to allow querying conversations.
    • Markus worked on a feature to create flow reports from Grafana dashboards.
    • Patrick worked on fixing the minion status page.
    • I worked on fixing layout in the alarm details page in Helm (word-wrapping, etc.)
    • I added color theme support to allow changing the severity color scheme.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future OOH, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last OOH

  • ALEC-66: Expand test coverage for UDL
  • ALEC-69: Memory leak in ClusterEngine caused by spatial distance caching
  • HELM-144: grafana-cli only installs 2.0.0 (or older), not 3.0.1
  • HZN-1470: Implement ReST Service for the new Graph Service
  • HZN-1490: Provide new GraphProvider implementation for the ApplicationTopologyProvider
  • HZN-1555: Design new header for PDF reports
  • HZN-1558: Add OpenTracing support for Sink API
  • HZN-1573: Unable to logon using default admin/admin account after fresh install
  • HZN-1574: Documentation broken
  • IPL-33: create function updated in PostgreSQL 11.
  • NMS-6920: provide “select all” button in scheduled outages menu
  • NMS-9811: Thresholds should work without restart when putting nodes into categories
  • NMS-10624: Upgrade Kafka components to 2.2.0
  • NMS-10626: Enable extraction of match groups from regex matches in Event.Mask.Varbind.Vbvalue
  • NMS-10667: health:check command times out when a health check command gets stuck
  • NMS-10668: Vaadin bundles stuck intermittently stuck in “Waiting” state
  • NMS-10670: Table in “Manage Minions” page fails to load
  • NMS-10674: Can not delete node due to database table constraint
  • NMS-10676: .rpmnew, .rpmsave, and .dpkg-dist files not erroring out properly
  • NMS-10677: Some config files should be marked %config (rather than %config(noreplace)) in RPMs
  • NMS-10680: Installing the opennms package installs Oracle JDK 8 instead of OpenJDK 11 on Ubuntu
  • NMS-10689: Allow running integration tests without running unit tests
  • NMS-10693: Java configuration is ignored on Ubuntu/Debian
  • NMS-10694: CVE-2018-20433: XXE Vulnerability in c3p0 <
  • NMS-10696: Memory leak in WS-Man

by RangerRick at May 20, 2019 03:56 PM

May 13, 2019

OpenNMS On the Horizon – May 13th, 2019 – OpenTracing, Bug Fixes, Refactoring, Graph Service, UI Fixes, and More!

It’s time for OpenNMS On the Horizon!

In the last week we fixed a lot of bugs, did more work on OpenTracing, made improvements to the new graph service, cleaned up various UI annoyances, and more.

Github Project Updates

  • Internals, APIs, and Documentation
    • Patrick did more work on fixing an SNMP proxy address resolution issue.
    • Markus did more work on geocoder API and UI improvements.
    • Ronny did more work on Docker image improvements.
    • Patrick worked on enhancements to the graph APIs to allow edges to contain vertexes from multiple namespaces.
    • Dustin removed the alarm-change-notifier plugin.
    • David continued to work on refactoring threshd out of opennms-services.
    • Dustin worked on supporting wildcards in service names in the poller-configuration.xml.
    • Ron Roskens worked on modernizing and cleaning up some of our Maven stuff.
    • Markus worked on refactoring existing topology providers to the new graph service.
    • Chandra did more work on OpenTracing support for the Sink API.
    • I worked on making it so integration tests can be run without unit tests, to allow splitting up builds in Bamboo.
    • David continued to work on making thresholds reload when node categories change.
    • Markus worked on fixing a timeout issue that can happen when doing minion health:check.
    • I fixed the Debian install so it prefers OpenJDK 11 (headless) over OpenJDK 8.
    • Ronny worked on modernizing our JDK terminology in the documentation.
    • Naicisum contributed an update to modernize our kafka client and scala dependencies.
    • David worked on making it possible to extend the Threshd configuration through OIA.
    • I did more work fixing various shell startup issues in OpenNMS, Minion, and Sentinel.
  • Web, ReST, and UI
    • I worked on splitting out a single “source of truth” set of sass files to be shared between various OpenNMS UI bits.
    • Christian did more work on supporting searching for nodes with or without flow data.
    • Patrick worked on fixing the “Manage Minions” page.
    • Markus did more work on a ReST interface for the new graph service.
    • Markus fixed some color issues in the requisition UI as a result of the Bootstrap 4 upgrade.
    • Markus worked on fixing exception handling in the node page when you attempt to view a node that doesn’t exist.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future OOH, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last OOH

  • HZN-1492: Tag “root cause” alarm when providing feedback
  • HZN-1520: Rework the GeocoderService-Implementations
  • HZN-1523: Make GraphRepository persist collections
  • HZN-1531: Support large buffer sizes in Kafka Sink Layer
  • HZN-1536: Sink Metrics
  • HZN-1539: Indicators for nodes with flow data
  • HZN-1540: Search for nodes that have flow data
  • JS-33: Alarms formatted as a table is gone with update to 1.4.0
  • NMS-9834: Wrong permissions for rrd files when using MultithreadedJniRrdStrategy
  • NMS-10675: Missing Indication of Sync Needed for Requisitions
  • NMS-10679: Node detail page renders with no content when invalid node ID specified
  • NMS-10684: Apparent memory leak in JMX collector, possibly restricted to “weird” JMX transports

by RangerRick at May 13, 2019 03:21 PM

May 06, 2019

OpenNMS On the Horizon – May 6th, 2019 – Thresholding, Docker, Sink and RPC, Flows, and More!

It’s time for OpenNMS On the Horizon!

In the last week we did more thresholding cleanups, lots of Docker work, graph service updates, sink and RPC improvements, startup and config checking fixes, flow UI and ReST updates, and a ton of bug fixing.

Github Project Updates

  • Internals, APIs, and Documentation
    • David worked on fixing reloading thresholds when node categories change.
    • Ronny did more work on OpenNMS/Minion/Sentinel Docker containers.
    • Patrick continued his work on updates to the new graph service.
    • Marcel did some work cleaning up threshold events to be more descriptive/useful.
    • Matt worked on refactoring API types to the OIA.
    • Chandra worked on supporting large buffer sizes in the Kafka sink, along with some other sink/RPC metric updates.
    • David started working on refactoring Threshd out of opennms-services.
    • I fixed a bug that could cause opennms start exit codes to indicate success when they should have failed.
    • I worked on cleaning up the various RPM packages so they would create more .rpmsave files rather than .rpmnew for configs that should be overwritten by default.
    • I fixed rpmnew/rpmsave/dpkg-dist checking in OpenNMS startup as well as adding it to Minion and Sentinel.
    • Chandra added tracing to the sink API.
    • Chandra fixed a memory leak in Drools when it reloads configuration.
    • Dino contributed some fixes to the WS-Man asset adapter.
    • Christian fixed a bug in threshold load calculation.
    • Jesse fixed a memory leak in the JMX connector.
    • Patrick fixed a bug where SNMP proxy hosts would be addressed incorrectly.
  • Web, ReST, and UI
    • Christian updated the node search page to include the ability to search for nodes with or without flow data, as well as showing flow data indicators on the node list, node details, and resource graph pages.
    • David did some more work on the Helm enhancements to support root cause feedback.
    • Matt started work on supporting graphing flows for a specific host.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future OOH, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last OOH

  • HZN-1496: RPC Metrics
  • HZN-1557: Meta-Data Documentation Format Wrong
  • NMS-10658: Remove alarm-change-notifier plugin
  • NMS-10671: Default load threshold contains caluculation error

by RangerRick at May 06, 2019 05:27 PM

April 30, 2019

OpenNMS Launches Horizon 24

Open Source AIOps Platform, “ALEC,” Streaming Telemetry, Network Traffic Analysis, and Java 11 Support

The OpenNMS Group announced the release of OpenNMS Horizon 24. Horizon 24 is the most comprehensive version of OpenNMS to date, making it one of the most powerful, scalable monitoring systems available anywhere.  Horizon 24 introduces ALEC, the first open source AIOps platform for learning enabled correlation. ALEC (Architecture for Learning Enabled Correlation) enables users to quickly detect, visualize, and resolve situations across the entire IT infrastructure. Horizon 24 organizes faults into visible situations that enables users to quickly visualize network problems and prioritize their efforts to resolution. Horizon 24 includes over 100 key updates, including streaming telemetry, network traffic analysis and support for Java 11.

“Horizon 24 combines the best of Open Source community-development and AI technologies,” said David Hustace, President, The OpenNMS Group, Inc. “Horizon 24 features have been tested in mission-critical network environments by our customers over the last eighteen months and today are delivered to the community through our open source model with the Horizon 24 release.”

Health technology company, Cerner, has been using OpenNMS as a highly scalable fault and performance management system since 2012, collaborating with The OpenNMS Group to build a monitoring fabric enriching all business applications. Most recently, the introduction of ALEC represents a vision shared between Cerner and many other large corporations and the OpenNMS community.

“The AI correlation introduced in Horizon 24, along with its streaming integration APIs, allows us to move away from proprietary solutions and helps us provide a reduced ‘Meantime to Knowledge’ for Cerner’s operations teams.”  said Jim Avazpour, Director Operations Center, Cerner Corporation.

Cool New Features:

Built to meet the needs of big data infrastructures, Horizon 24 uses machine learning technologies to achieve greater scale in IT operations, network monitoring and management.

AI Correlation Engine – Horizon 24’s ALEC, uses two machine learning algorithms including unsupervised (alarm clustering) and supervised (deep learning) built using TensorFlow.

Streaming Telemetry – OpenNMS has added support for Juniper and Cisco streaming telemetry protocols which allows devices to autonomously send performance metrics optimizing the performance of your network devices. Horizon 24 distributed streaming technologies integrate with Kafka, ElasticSearch and Grafana.

Java 11 – Horizon 24 now supports running on Java 11, including OpenJDK, thereby removing the dependencies on commercial JDK solutions for Java.

About The OpenNMS Group

“The OpenNMS Group stewards the OpenNMS open source software project and provides OpenNMS Meridian, a long term support release of OpenNMS, as well as commercial support, consulting and training services.”

by jessi at April 30, 2019 03:50 PM

OpenNMS On the Horizon – April 30th, 2019 – Horizon 24, Graph Service, RPC and Sink API, Docker Containers, and UI Changes

It’s time for OpenNMS On the Horizon!

Sorry this is out a bit late, I’m super sick. I’ll probably write this and go right back to bed.

In the last week we did a lot of preparation for Horizon 24, plus continued work on the graph service, RPC and sink metrics, and some UI work.

Github Project Updates

  • Internals, APIs, and Documentation
    • Patrick did more work on enhancements to the graph service.
    • I cleaned up some startup script issues in prep for Horizon 24.
    • Jesse worked on extracting additional event parameters from varbind values based on regular expressions.
    • Jeff fixed upgrades to keep karaf shell history.
    • Matt made some fixes to prefab graph support in OIA.
    • Chandra continued to work on enhancing metrics and tracing in the RPC and Sink APIs.
    • Lots of documentation work was done in preparation for Horizon 24’s release.
    • David started working on fixing thresholds handling node category changes.
    • Patrick worked on persisting graph repository collections.
    • Ronny did more work on improvements to our docker containers.
  • Web & UI
    • David did more work on the Helm integration for root cause handling in situations.
    • Christian fixed an issue in the MIB parser that could cause exceptions.

April OpenNMS Horizon and Meridian Releases

OpenNMS Horizon (Rapid Release)

April marks a new major Horizon release: OpenNMS Horizon 24. The most notable improvements are big updates to alarm correlation (including support for Tensorflow-based AI correlation), a new developer API for creating OpenNMS plugins, a web UI refresh, and flow enhancements.

For a complete list of changes in Horizon 24, see the release notes.

OpenNMS Meridian (LTS)

We also updated Meridian all the way back to Meridian 2016, primarily to fix a bug in SNMP processing of certain buggy agent behaviors that could cause an out of memory exception.

For a complete list of changes, see the release notes:

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future OOH, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last OOH

  • ALEC-58: Expand test coverage for smoke tests
  • ALEC-61: Add health check for the main driver
  • ALEC-62: KEM: Reduce messages sent to stdout/stderr
  • ALEC-64: Inventory does not automatically refresh with the direct datasource
  • HELM-146: Rework alarm table severity colors
  • HZN-1078: Java 9 Support
  • HZN-1284: Project Sentinel
  • HZN-1317: Enhanced Alarm Life Cycle and Service Layer
  • HZN-1320: Topology and model enhancements for correlation
  • HZN-1513: circleci packaging strategy
  • HZN-1522: The GraphService interface should return GenericGraph instead of Graph<?, ?>
  • HZN-1535: Initial CircleCI pipeline
  • HZN-1542: PDF reports with Jasper and Grafana POC
  • NMS-9737: Cleanup default SNMP data collection
  • NMS-10420: Wrong data type for Cassandra Thread Pool performance metrics
  • NMS-10576: Documentation has not addressed refactoring of Single-Port flow listener
  • NMS-10608: Fix all bootstrap 4 related issues or UI fixes we want to address with that upgrade
  • NMS-10638: Allow Java 8-11 by default
  • NMS-10647: ArrayIndexOutOfBoundsException during error handling in SNMP MIB Compiler
  • NMS-10663: bin/runjava tries to evaluate an empty value as a candidate JVM pathname
  • OIA-18: Exposing prefabricated graphs is broken

by RangerRick at April 30, 2019 02:29 PM

April 25, 2019

OpenNMS Horizon 24 Released

Release 24.0.0 is the latest stable release of OpenNMS. It contains a large number of bug fixes and enhancements, most notably adding machine-learning-guided correlation of alarms, and many improvements to Netflow/IPFIX/sFlow support.

For a high-level overview of what’s changed in OpenNMS 24, see What’s New in OpenNMS 24.

For a detailed list of changes, see the changelog.

The codename for 24.0.0 is Hal 9000.

by RangerRick at April 25, 2019 08:53 PM

OpenNMS Meridian 2018.1.7 Released

Release 2018.1.7 is an update to Meridian 2018.1.6. It contains a few changes including UI updates and an SNMP loop bug that could cause out-of-memory crashes.

The codename for 2018.1.7 is High wind.

  • Cannot run Minion as non-root (Issue LTS-231)
  • ROLE_PROVISION doesn’t work on the UI when the ACL feature is enabled. (Issue NMS-9786)
  • Search on KSC Reports page in WebUI does not work (Issue NMS-10416)
  • Incorrect date formatting in (Issue NMS-10602)
  • The MIB Compiler is unable to parse certain MIBs (Issue NMS-10609)
  • ArrayIndexOutOfBoundsException during error handling in SNMP MIB Compiler (Issue NMS-10647)
  • When editing a surveillance category from Admin flow, lists of nodes are not sorted by node label (Issue NMS-10654)
  • Karaf shell history thrown out with bathwater on upgrade (Issue NMS-10664)
  • Improve test coverage of SNMPv3 traps and informs (Issue NMS-10630)
  • Allow the “step” (or interval) to be referenced from a Measurement API expression (Issue NMS-10633)
  • “Event text contains” should search beyond eventlogmsg (Issue NMS-8444)

by RangerRick at April 25, 2019 04:56 PM

OpenNMS Meridian 2017.1.16 Released

Release 2017.1.16 is a small update to 2017.1.15 that has a few changes including UI updates and an SNMP loop bug that could cause out-of-memory crashes.

The codename for 2017.1.16 is Florence meridian.

  • ROLE_PROVISION doesn’t work on the UI when the ACL feature is enabled. (Issue NMS-9786)
  • Search on KSC Reports page in WebUI does not work (Issue NMS-10416)
  • Backport SNMP successor validation (Issue NMS-10622)
  • ArrayIndexOutOfBoundsException during error handling in SNMP MIB Compiler (Issue NMS-10647)
  • When editing a surveillance category from Admin flow, lists of nodes are not sorted by node label (Issue NMS-10654)
  • Karaf shell history thrown out with bathwater on upgrade (Issue NMS-10664)
  • “Event text contains” should search beyond eventlogmsg (Issue NMS-8444)

by RangerRick at April 25, 2019 04:01 PM

OpenNMS Meridian 2016.1.20 Released

Release 2016.1.20 is a small update to 2016.1.19 that has a few changes including UI updates and an SNMP loop bug that could cause out-of-memory crashes.

The codename for 2016.1.20 is Craster Parabolic.

  • doesn’t understand newer JDK output (Issue NMS-10401)
  • Backport SNMP successor validation (Issue NMS-10622)
  • When editing a surveillance category from Admin flow, lists of nodes are not sorted by node label (Issue NMS-10654)
  • Karaf shell history thrown out with bathwater on upgrade (Issue NMS-10664)
  • “Event text contains” should search beyond eventlogmsg (Issue NMS-8444)

by RangerRick at April 25, 2019 03:20 PM

April 22, 2019

OpenNMS On the Horizon – April 22nd, 2019 – APIs, Packaging, IFTTT, Protobuf, CircleCI, Helm, and More!

It’s time for OpenNMS On the Horizon!

In the last week we worked on the IFTTT integration, RPC, OIA and sink APIs, the new graph topology service, Helm, various UI tweaks, and more!

Github Project Updates

  • Internals, APIs, and Documentation
    • Christian updated the IFTTT integration to support reduction key filters.
    • Chandra worked on wrapping sink messages with protobuf so additional metadata can be associated with them.
    • I finished my updates to Minion and Sentinel packaging to fix issues with overriding default configuration at startup.
    • I finished my work updating the OpenNMS packaging to require OpenJDK 11 by default.
    • Matthew did more work converting OIA to use immutable objects.
    • Marcel did some cleanup on trap event messages.
    • Jesse did more work on the CircleCI build proof-of-concept.
    • Chandra continued his work on adding tracing to the RPC code.
    • Patrick did more work on the graph topology provider.
    • Jeff worked on a proof-of-concept to pull matching data out into varbinds when processing events.
    • Christian fixed an issue in the MIB parser that could make it difficult to diagnose failures.
  • Web & UI
    • David continued his work on the Helm integration for root cause handling in situations.
    • Christian updated webapp session handling to not timeout for the browser notification integration.
    • Alejandro’s fixes for search on the KSC report page were merged.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last OOH

  • HZN-1509: Minion stops sending flow data into Kafka
  • HZN-1511: Meta-data gets deleted when requisition is modified in UI
  • HZN-1516: Add OpenTracing support for Camel (JMS) RPC
  • HZN-1519: Add ability to provide custom tags to OpenTracing by RPC Modules
  • HZN-1529: Wrap Sink Message in Protobuf
  • HZN-1533: Random compilation failures in opennms-base-assembly
  • NMS-9893: Alarm Clear Trigger query language performance improvement
  • NMS-10416: Search on KSC Reports page in WebUI does not work
  • NMS-10511: Disable session timeout by default
  • NMS-10540: After login the favicon appears instead of the starting page
  • NMS-10631: Configuration directives in /etc/sysconfig/sentinel are not being applied
  • NMS-10639: OpenNMS Horizon installs OpenJDK 1.8.0 even OpenJDK 11 is already installed
  • NMS-10642: DNSResolutionMonitor incorrectly sets port number
  • NMS-10643: MonitoredServiceDaoIT test fail due to database
  • NMS-10644: EventdIT test failure
  • NMS-10645: DuplicatePrimaryAddressIT logs a failure due to logging assertion
  • NMS-10646: The evaluation layer that helps sizing Cassandra is broken
  • NMS-10650: Vaadin geographical maps broke in Chrome
  • NMS-10651: logging methods have incorrect number of arguments.
  • NMS-10653: add isAcknowledged to the alarm model for the rest interface
  • NMS-10654: When editing a surveillance category from Admin flow, lists of nodes are not sorted by node label
  • NMS-10656: remove centric troubleticket plugin
  • NMS-10664: Karaf shell history thrown out with bathwater on upgrade
  • OIA-12: Integration API feature (opennms-integration-api-features) fails to start if Collectd/Pollerd services are not enabled
  • OIA-13: Replace bean implementations with immutables
  • PRIS-146: create groovy script to select nodes by category

by RangerRick at April 22, 2019 05:20 PM

April 15, 2019

OpenNMS On the Horizon – April 15th, 2019 – OCE is Now ALEC, TWiO is Now OOH, RPC Tracing, Horizon 24, Helm, and More!

It’s time for OpenNMS On the Horizon!

In the last week we did a lot of prep work for Horizon 24, including better debug tracing for RPC and geocoder fixes, as well as other ALEC updates.

Oh, and we renamed OCE to ALEC (Architecture for Learning Enabled Correlation) now that it has a TensorFlow-based engine. And I renamed This Week in OpenNMS to OpenNMS On the Horizon because c’mon, you gotta admit it’s a better name. Get it? Horizon? Fine. Well it’s staying this way whether you like it or not.

Github Project Updates

  • Internals, APIs, and Documentation
    • Jesse and Ronny continued to work on a CircleCI workflow for OpenNMS builds.
    • Chandra fixed loading a couple of the less common timeseries strategies that got broken in a refactor.
    • Chandra did wrap-up on opentracing support for RPC communications.
    • Markus started in on reworking our geocoder support, including a configuration UI.
    • Dustin worked on making poller configuration applicable to more than one service through wildcards.
    • Matt did more work on running ALEC on the JVM.
    • I updated the dependencies for Horizon 24 so it can be installed without requiring a specific JVM, as well as allowing JDK 8 through 11.
    • Jesse added support for pushing topology edge updates with a Karaf command.
  • Web & UI
    • David did more work on the UI for root cause submission.
    • I worked on more enhancements to Helm, including mouseovers for alarm logs, auto-wildcarding for text search in the filter panel, and a number of bug fixes and maintenance chores.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last TWiO

  • ALEC-10: Create Situation Icon
  • ALEC-11: Create Network Interface Icon
  • ALEC-55: Add topology(link) support to the OpenNMS Direct datasource
  • ALEC-60: Create Debian packages for OCE
  • ALEC-63: OCE unusable when graph contains large number of deferred IOs
  • HELM-135: Add isAcknowledged to the faults datasource and alarm table
  • HELM-140: update build to use webpack
  • HELM-142: Text control in filter panel disappears on auto-refresh, but other panels remain filtered
  • HELM-143: Add td.title to alarm-table cell contents
  • HZN-1495: Investigate OpenTracing for our RPC communications
  • HZN-1517: Support compression in JestClient for Elasticsearch
  • JS-29: Add isAcknowledged to alarm object and queries
  • NMS-10640: IFTTT feature should also support BSM alarms

by RangerRick at April 15, 2019 06:40 PM

April 09, 2019

OpenNMS.js v1.4.0

This is a small feature release with a few changes targeted primarily to Helm.


  • alarms: HZN-1492 : Add RootCause and Tags to SituationFeedback (#35) (3790072)
  • dao: JS-29 - add support for "isAcknowledged" on alarms (ff1515a)

by RangerRick at April 09, 2019 07:50 PM

April 08, 2019

This Week in OpenNMS – April 8th, 2019 – API Updates, Karaf and RPC Debugging, Java 11, OCE, Helm, and more!

It’s time for This Week in OpenNMS!

In the last week we did more work on an updated graph service, continuous integration, debugging tools for Karaf and RPC communications, Java 11 support, OCE improvements, Helm updates, and more.

Github Project Updates

  • Internals, APIs, and Documentation
    • Patrick did more work on domain-specific graph objects in the new graph service.
    • Jesse and Ronny did more work on doing OpenNMS builds and tests in CircleCI.
    • David added root cause feedback to the health check status.
    • Markus did some refactoring of the new graph service.
    • Chandra did more work integrating Jaeger tracing into our RPC communications.
    • Matt did more work on the direct OCE datasource.
    • Jesse updated the JEXL engine to expose step size for runtime calculation purposes.
    • Christian and I did some CLI fixes for running under Java 11.
    • Dustin did more work on arbitrary node metadata support.
    • Jesse worked on improving performance when there are a large number of deferred IO requests pending in OCE.
    • Jesse fixed a lazy initialization issue when using ticketing and alarms in elasticsearch.
    • Christian added support for filtering UEIs for IFTTT triggers.
    • Christian fixed an issue with the RadixTreeSyslogParser and syslog messages with extraneous text.
  • Web & UI
    • I did more work on Helm, including additional filter panel support, fixing pagination-refresh in the alarm table, and updating the build to use webpack.
    • Markus fixed some more issues in the updated Bootstrap 4 UI.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last TWiO

  • HZN-1500: Webapp fails to start on Java 9
  • HZN-1504: Enhance RadixTreeSyslogParser to ignore specific characters
  • HZN-1505: Implement domain specific graph objects in New Graph service
  • HZN-1506: Remove PluginManager
  • HZN-1508: Update plugin to work with ES 6.7.x
  • HZN-1510: Refactor GraphProvider to return Graph instead of being the Graph itself
  • HZN-1512: Remove unused indexNew.jsp files
  • NMS-10426: broken xml code in foreign source/imports is not being detected
  • NMS-10594: LazyInitializationException when using ticketing and alarm history in Elastic
  • NMS-10598: Add node/interface/service details as scopes to Meta-DSL
  • NMS-10619: Init script errors when starting Sentinel on RHEL 6.6
  • NMS-10622: Backport SNMP successor validation
  • NMS-10630: Improve test coverage of SNMPv3 traps and informs
  • NMS-10632: The navigation sidebar on the resource graph page is not working after the Bootstrap 4 changes
  • NMS-10633: Allow the “step” (or interval) to be referenced from a Measurement API expression
  • NMS-10637: %interface% & %interfaceresolve% variables do not resolve values in notifications
  • NMS-10641: Starting opennms.service triggers numerous exceptions
  • OIA-11: Expose service for persisting collection sets

by RangerRick at April 08, 2019 06:21 PM

April 01, 2019

This Week in OpenNMS – April 1st, 2019 – CircleCI, OCE Testing, Bug Fixes, Helm and OpenNMS UI, Java 11, and more!

It’s time for This Week in OpenNMS!

In the last week we worked on a new continuous integration workflow, better end-to-end OCE testing, lots of bug fixes, Helm and OpenNMS UI improvements, and more updates for modern Java support.

Github Project Updates

  • Internals, APIs, and Documentation
    • Matt continued his work on integrating the OCE end-to-end test framework.
    • Jesse made more improvements to Syslog parsing.
    • Jesse and Ronny did more work on CircleCI build support.
    • I added support for Canadian Ethernet, which requires sending an additional frame encoded 0x534f525259 when errors occur in a single collision domain.
    • Jesse worked on a number of controller improvements in OIA.
    • Matt continued his work creating a “direct” OCE datasource, facilitating running it in the OpenNMS JVM.
    • Markus did more work on his branch to upgrade our internal CXF to version 3.2.
    • David did some wrapup work on the configurable meta-model support in OCE.
    • David worked on root cause situation feedback support in OCE and Helm.
    • Chandra added support for opentracing to Kafka RPC.
    • Jesse did more work on improving the deep learning correlation engion.
  • Web & UI
    • Markus did more work on improvements to the new Bootstrap 4 UI.
    • Markus continued work on implementing a ReST service for the new graph API.
    • I updated the OpenNMS.js and Helm codebases to fix all audit warnings from yarn audit.
    • I continued to work on Helm improvements including Grafana 6 fixes and more work on the dynamic filter panel.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last TWiO

  • HELM-132: Issues when running on Grafana 6
  • HELM-134: Relative date format ‘en-short’ not correctly defined
  • HELM-138: Fix security issues in Helm dependencies
  • HZN-1478: Upgrade CXF to 3.2.x or greater
  • HZN-1485: Minion – RPM Upgrade does not clear out .m2 local directory
  • HZN-1489: Use node category membership in Drools rules for alarms
  • HZN-1498: User defined links
  • HZN-1503: Related alarms are not deleted from situations
  • JS-28: PropertiesCache Does Not Survive Minification
  • JS-30: Refactor FilterCloner from Helm to OpenNMS.js
  • JS-31: Fix security issues in OpenNMS.js dependencies
  • NMS-8444: “Event text contains” should search beyond eventlogmsg
  • NMS-9376: Set Label For Surveillance Category
  • NMS-10435: Minion Status showing wrong in Manage Minions and service minion status out put is empty
  • NMS-10569: Tables do not space columns out correctly (col-* is no longer supported)
  • NMS-10581: Stop gracefully when running in container environment
  • NMS-10628: Slack/Mattermost integration needs an additional option
  • NMS-69420: Canadian Ethernet support
  • OCE-32: Audit CPN Tickets and OpenNMS Situations
  • OCE-41: Configurable “meta” model
  • OCE-46: End-to-end test framework
  • OCE-57: NPE in DirectInventoryDatasource

by RangerRick at April 01, 2019 12:00 AM

March 27, 2019

OpenNMS.js v1.3.1

This is a minor release with a fix for the properties cache and a cleanup of dependencies.

Bug Fixes

  • build: JS-28 - don't mangle function names (07100ca)
  • build: JS-31 - fix all outstanding audit warnings (ace9779)
  • cli: fix alarm cli when no alarms are returned (75c6a9a)


  • api: JS-30 - reconstitute Filter/Clause/Restrictions from JSON (891bbd1)

by RangerRick at March 27, 2019 05:54 PM

March 25, 2019

This Week in OpenNMS – March 25th, 2019 – OCE, Java 9+, UI Improvements, Helm, and more!

It’s time for This Week in OpenNMS!

In the last week we continued work on OCE improvements, Java 9+ changes, UI cleanups, Helm and OpenNMS.js updates, and much more!

Github Project Updates

  • Internals, APIs, and Documentation
    • Patrick continued his refactor of the topology provider to use the new graph service.
    • David continued his work on a configurable meta model for OCE.
    • Dustin kept working on supporting arbitrary metadata on nodes.
    • Markus did more work on the CXF upgrade to 3.2 to facilitate running on newer JDKs.
    • Ronny worked on performing OpenNMS builds in CircleCI.
    • Jesse did more work on fixing an SNMP loop caused by bad agents.
    • Chandra worked on improving our error/exit code handling for graceful JVM shutdown.
    • Jesse updated Alarmd to allow for using node category when handling alarms in Drools.
    • Jesse worked on fixing an issue where related alarms were not properly removed from situations.
    • Matt did more work on the “direct” OCE datasource, which allows running OCE in the OpenNMS JVM.
  • Web & UI
    • Christian worked on updating the node detail page to make the number of services displayed configurable.
    • Markus worked on some new UI changes to the web navbar.
    • Christian updated event and alarm search to include logMessage and description.
    • Jesse did some changes to support user-specified (manual) links in the topology UI.
    • I worked on a number of Helm enhancements, including creating an inventory datasource, as well as making a custom filter panel that can do dynamic filtering on the alarm table.
    • Markus worked on cleaning up some table layout issues in the new Bootstrap 4 UI.
    • I added support for isAcknowledged to OpenNMS.js and fixed some bugs in property caching in ReST DAOs.

OpenNMS Horizon and Meridian March Releases

Last week we put out updated OpenNMS Horizon and Meridian releases. These were, for the most part, small bug fix releases as we prep for Horizon 24 (which Meridian 2019 will be based on).

If you are running Meridian 2018 or Horizon 23, it is strongly recommended you upgrade. An SNMP agent loop bug was fixed in both of those releases, as well as an Alarmd deadlock that affects all Horizon 23 versions.

For a complete list of changes, see the release announcements:

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last TWiO

  • HZN-1455: Update startup script to bootstrap with Java 9+
  • HZN-1499: ON DELETE CASCADE missing in Metadata database changelog
  • NMS-10570: Alarm details page has problem with footer
  • NMS-10571: The categories card on the node details page has no spacing above the bottom border
  • NMS-10572: Appears to be extra whitespace above bottom border on cards in the Admin page
  • NMS-10575: KSC Create Custom Graph Prefabricated Report selection is too small and wraps
  • NMS-10602: Incorrect date formatting in
  • NMS-10609: The MIB Compiler is unable to parse certain MIBs
  • NMS-10612: Button arrangement on alarm detail page is broken
  • NMS-10613: Sticky and Journal Memo icons look out of place
  • NMS-10614: Alarm Details page is not rendering related alarms and parent situations correctly
  • NMS-10615: Notification switcher is broken
  • NMS-10621: Bad response from SNMP agent leads to infinite loop in SNMP tracker
  • NMS-10623: KSC resource selection is not shown/visualized
  • OCE-44: OCE Documentation

by RangerRick at March 25, 2019 12:00 AM

March 21, 2019

OpenNMS Meridian 2018.1.6 Released

Release 2018.1.6 is an update to Meridian 2018.1.5. It contains a number of changes including a ReST issue with truncated numbers, 3rd-party JDBC support in the Minion, a performance fix for the Measurements API, and a fix for bad (looping) SNMP agents.

The codename for 2018.1.6 is Strong breeze.

  • Collection results via Minion is limited to MAX_INT (Issue NMS-10516)
  • JDBC via Minion fails to find 3rd party classes (Issue NMS-10559)
  • Poor performance when using filters in the Measurements API (Issue NMS-10589)
  • Update webapp copyright dates to 2019 (Issue NMS-10591)
  • Bad response from SNMP agent leads to infinite loop in SNMP tracker (Issue NMS-10621)

by RangerRick at March 21, 2019 08:19 PM

OpenNMS Meridian 2017.1.15 Released

Release 2017.1.15 is a small update to 2017.1.14 that fixes a performance issue in the measurements API.

The codename for 2017.1.15 is Warsaw meridian.

  • Poor performance when using filters in the Measurements API (Issue NMS-10589)
  • Update webapp copyright dates to 2019 (Issue NMS-10591)

by RangerRick at March 21, 2019 06:56 PM

OpenNMS Meridian 2016.1.19 Released

Release 2016.1.19 is a small update to 2016.1.18 that fixes a performance issue in the measurements API.

The codename for 2016.1.19 is Mollweide.

  • Poor performance when using filters in the Measurements API (Issue NMS-10589)
  • Update webapp copyright dates to 2019 (Issue NMS-10591)

by RangerRick at March 21, 2019 05:56 PM

March 18, 2019

This Week in OpenNMS – March 18th, 2019 – Karaf Upgrade, Correlation, Bug Fixing, and Much Much More!

It’s time for This Week in OpenNMS!

I’m sorry for missing last week’s TWiO, but I was busy in the Caribbean with a bunch of nerds. Meanwhile, everyone was apparently very busy and much more productive without me. Maybe I should get back on the boat.

In the last couple of weeks we did a ton of bug fixing, updated our embedded Karaf to 4.2, and did a ton of work on correlation. Plus a whole lot of other great stuff.

Github Project Updates

  • Internals, APIs, and Documentation
    • David worked on fixing up flapping syslog time tests.
    • Jesse did more work on fixing potential deadlocks in Alarmd.
    • Chandra did more work making sure 3rd-party JDBC drivers can be loaded on Minion.
    • Markus wrapped up his work to update our embedded Karaf to 4.2.
    • Patrick did more work making improvements to the events:stress command.
    • Antonio made enhancements to bridge topology generation when events are sent.
    • Matt updated OCE test infrastructure to use a vanilla Sentinel container and worked on integrating the old OCE end-to-end tests into OpenNMS’s build.
    • David did more work on the configurable meta-model for OCE.
    • Ronny worked on RPM CI and deployment for OCE.
    • Dustin did more work on handling flow packets that mix templates and data.
    • Antonio fixed an issue with ack escalation events.
    • Matt did more work integrating links from enlinkd into the OCE graph.
    • Matt fixed an issue with the Kafka datasource blocking OCE engine initialization.
    • Ronny started working on a documentation pipeline for OCE.
    • Jesse started work on an alternative OCE plugin that uses TensorFlow for clustering alarms.
    • Chandra fixed an issue with leftover artifacts in .m2 after upgrading the Minion.
    • Jesse made a number of changes to expose useful data including node model updates, exposing requisition providers, and event listeners in the API.
    • Jesse added support to snapshotting data in OCE.
    • Chandra fixed a bug in the Minion status poller config that could cause timeouts or incorrect status.
    • David Hustace improved the documentation for the OCE project.
    • Alejandro improved our Docker images so they can run on OpenShift infrastucture without requiring root permissions. These changes should available alongside the release of Horizon 24.
  • Web & UI
    • I got a working inventory datasource put together for Helm, and started on a filter panel using it.
    • Chandra fixed an issue in the ReST adapters that could cause values to be truncated to MAX_INT.
    • Markus did a few fixes and improvements in the new Bootstrap 4 UI.
    • Alejandro fixed an issue in the MIB parser UI.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last TWiO

  • HELM-131: Template Variables Don’t Support Nested Parenthesis
  • HZN-1474: Node Meta-Data / KVP Support
  • HZN-1475: Extend topology generator and test suite to support bridge topology
  • HZN-1481: Show managed object type and instance on alarm details page
  • HZN-1483: Update topology code to make the topology that is sent via Kafka more convenient for consumption
  • HZN-1484: Templates are dropped if intermixed with data in wrong order
  • HZN-1487: Maximum number of services in node details page should be configurable
  • NMS-10516: Collection results via Minion is limited to MAX_INT
  • NMS-10539: Upgrade to Karaf 4.2.3
  • NMS-10558: Upgrade to Jetty 9.4.12
  • NMS-10559: JDBC via Minion fails to find 3rd party classes
  • NMS-10589: Poor performance when using filters in the Measurements API
  • NMS-10593: Alarmd get stucks in dead-lock and stops processing events
  • NMS-10596: Flapping Syslog Parser Integration Test
  • NMS-10601: @PreserveOnRefresh not working for embedded Vaadin UIs
  • NMS-10603: Fix ack Event Supporting AckAction
  • NMS-10606: HwEntityAlias fails to be persisted
  • OCE-38: Add log rotation support to the kafka-event-mirrorer
  • OCE-40: Links from Enlinkd in the OCE graph
  • OCE-45: Prevent the opennms kafka datasource from blocking the initialization of the engine forever
  • OCE-47: Create documentation pipeline
  • OCE-49: Enable the “Edit this page” feature in the published docs
  • OCE-50: Add Math support in documentation
  • OCE-52: Deep learning engine
  • OCE-56: Kafka streams error when handling edges
  • OIA-3: Extend the SNMP datacollection configuration
  • OIA-6: Add support for provisiond requisition providers
  • OIA-7: Consume events by registering event listeners
  • OIA-9: Expose operinstruct element in EventDefinition interface

by RangerRick at March 18, 2019 12:00 AM

March 14, 2019

Meeting Owl

One of the cool things I get to do working at OpenNMS is to visit customer sites. It is always fun to visit our clients and to work with them to get the most out of the application.

But over the last year I’ve seen a decline in requests for on-site work. This is odd because general interest in OpenNMS is way up, and it finally dawned on me why – fewer and fewer people work in an office.

For example, we work with a large bank in Chicago. However, their monitoring guy moved to Seattle. Rather than lose a great employee, they let him work from home. When I went out for a few days of consulting, we ended up finding a co-working space in which to meet.

Even within our own organization we are distributed. There is the main office in Apex, NC, our Canadian branch in Ottawa, Ontario, our IT guy in Connecticut and our team in Europe (spread out across Germany, Italy and the UK). We tend to communicate via some form of video chat, but that can be a pain if a lot of people are in one room on one end of the conference.

When I was visiting our partner in Australia, R-Group, I got to use this really cool setup they have using Polycom equipment. Video consisted of two cameras. One didn’t move and was focused on the whole room, but the other would move and zoom in on whoever was talking. The view would switch depending on the situation. It really improved the video conferencing experience.

I checked into it when I got back to the United States, and unfortunately it looked real expensive, way more than I could afford to pay. However, in my research I came across something called a Meeting Owl. We bought one for the Apex office and it worked out so well we got another one for Ottawa.

The Meeting Owl consists of a cylindrical speaker topped with a 360° camera. It attaches to any device that can accept a USB camera input. The picture displays a band across the top that shows the whole panorama, but then software “zooms” in on the person talking. The bottom of the screen will split to show up to three people (the last three people who have spoken).

It’s a great solution at a good price, but it had one problem. In the usual setup, the Owl is placed in the center of the conference table, and usually there is a monitor on one side. When the people at the table are listening to someone remote (either via their camera or another Owl), the people seated along the sides end up looking at the large monitor. This means the Owl is pretty much showing everyone’s ear.

It bothers me.

Now, the perfect solution would be to augment the Owl to project a picture as a hologram above the unit so that people could both see the remote person as well as look at the Owl’s camera at the same time.

Barring that, I decided to come up with another solution.

Looking on Amazon I found an inexpensive HDMI signal splitter. This unit will take one HDMI input and split it into four duplicate outputs. I then bought three small 1080p monitors (I wanted the resolution to match the 1080p main screen we already had) which I could then place around the Owl. I set the Owl on the splitter to give it a little height.

Meeting Owl with Three Monitors

Now when someone remote, such as Antonio, is talking, we can look at the small monitors on the table instead of the big one on the side wall. I found that three does a pretty good job of giving everyone a decent view, and if someone is presenting their screen everyone can look at the big monitor in order to make out detail.

Meeting Owl in Call

We tried it this morning and it worked out great. Just thought I’d share in case anyone else is looking for a similar solution.

by Tarus at March 14, 2019 06:54 PM

March 04, 2019

This Week in OpenNMS – March 4th, 2019 – OIA, Netflow, Topology, Java 9+, Helm, and More!

It’s time for This Week in OpenNMS!

Last week we worked on more OpenNMS Integration API implementations, Netflow fixes, topology data updates, the Karaf 4.2 upgrade for Java 9+ support, performance fixes, and Helm updates.

Github Project Updates

  • Internals, APIs, and Documentation
    • Chandra updated the documentation to describe how to add 3rd-party JDBC drivers to the Minion.
    • Matt did more work on getting topology data from Enlinkd into OCE.
    • Patrick updated the events:stress Karaf command to handle node ID and interface options properly when using JEXL.
    • Dustin worked on supporting Netflow 9 and IPFIX flows that intermix templates and data in a single packet.
    • Markus continued his work on updating our embedded Karaf to 4.2.
    • Dustin and David did more work on the feature to add arbitrary metadata to nodes.
    • Chandra continued his work integrating SNMP datacollection configuration into the OIA.
    • David worked on updating OCE to use the JSR-233 inventory model internally.
    • Antonio added some examples to the topology test generator.
    • Jesse worked on fixing a deadlock in Alarmd while waiting for transactions to commit.
  • Web & UI
    • Jesse worked on an update to the measurements API to improve performance when using filters.
    • I fixed a bug in the new Helm templating that broke when using nested parentheses in variables.
    • Chandra updated the web UI to show managed object type and instance on the alarm details page.

Updates to the Discourse Forum

Ronny has updated the knowledge base section of the Discourse forum to be in “wiki” mode, so anyone can update and improve knowledge base posts over time.

Additionally, there is now support for marking a topic as “solved” so if you have asked a question and someone gave a good answer, you can make it easier for others to find the solution.

Thanks to everyone who has been using it, we’ve gotten a ton of great conversations and additions to the knowledge base already.

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last TWiO

  • HZN-1479: Enable instant refresh in topology generator based on the recent changes in the topology code
  • HZN-1480: Telemetry UdpListener – False Positive Log Message Condition
  • HZN-1482: Update the topology generator so that topology is sent to the TopologyDAO
  • NMS-10591: Update webapp copyright dates to 2019

by RangerRick at March 04, 2019 12:00 AM

February 25, 2019

This Week in OpenNMS – February 25th, 2019 – Correlation, Data Collection, and Topology

It’s time for This Week in OpenNMS!

Last week we worked on Java 9+ fixes, correlation, data collection in OIA, and topology.

Github Project Updates

  • Internals, APIs, and Documentation
    • I wrapped up my fixes to the new refactored opennmsrunjava and find-java scripts.
    • Jesse continued to work on updating the correlation engine to use Drools internally.
    • Chandra continued to work on data collection support in the integration API.
    • Dustin did more work on arbitrary node metadata support.
    • Chandra worked on JDBC driver loading on Minions.
    • Matt did more work on building with Java 9+.
    • David worked on making the inventory model used by the correlation engine configurable.
    • Matt continued to work on adapting topology code to the new refactored Enlinkd and adding tests.
    • Markus continued to work on moving our embedded Karaf to 4.2.
  • Web & UI
    • I worked on wrapping up release stuff for Helm 3.0.

Horizon and Meridian Releases

We released OpenNMS Horizon 23.0.3 and Meridians 2016.1.18, 2017.1.14, and 2018.1.5.

The majority of the changes were bug fixes, along with a number of performance improvements in Horizon and in Meridian 2018.

For a complete list of changes, see the relevant release announcements:

Upcoming Events and Appearances

Until Next Week…

If there’s anything you’d like me to talk about in a future TWiO, or you just have a comment or criticism you’d like to share, don’t hesitate to say hi.

– Ben

Resolved Issues Since Last TWiO

  • HZN-1468: Remove features-maven-plugin from the build
  • HZN-1482: Update the topology generator so that topology is sent to the TopologyDAO
  • NMS-10549: Typo in Northbound registerNorthnounders method
  • NMS-10566: Update default Syslog parser to use the RadixTreeSyslogParser
  • NMS-10567: Browser crashes when browser notifications are enabled and OpenNMS gets unreachable
  • NMS-10579: Start with the start script throws bad substitution error
  • OIA-5: Add support for Ticketing Plugin in Integration API

by RangerRick at February 25, 2019 12:00 AM

February 22, 2019

OpenNMS Meridian 2018.1.5 Released

Release 2018.1.5 is an update to Meridian 2018.1.4. It contains a number of bug fixes including fixes for sending notifications for events without associated nodes, XSS issues, and more. It also includes a number of performance improvements and and update to the latest Jetty web server framework.

The codename for 2018.1.5 is Fresh breeze.

  • JDBC collector event reason provides no useful information (Issue NMS-9633)
  • syslog events are creating notifications and disregarding rules in place (Issue NMS-10486)
  • Node page very slow to load for nodes with more than 1000 events (Issue NMS-10506)
  • SNMP configuration UI should select location “Default” by default, not the first location alphabetically (Issue NMS-10514)
  • Wallboard URLs with board name should be permalinks, but return “Nothing to display” instead (Issue NMS-10515)
  • Event parameters table have strong limits for the columns (Issue NMS-10525)
  • Cross-Site Scripting: Reflected (Issue NMS-10546)
  • Cross-Frame Scripting (Issue NMS-10547)
  • syslog parsing of messages without a year will sometimes infer the wrong year (Issue NMS-10548)

by RangerRick at February 22, 2019 01:32 AM

OpenNMS Meridian 2017.1.14 Released

Release 2017.1.14 is a minor update to OpenNMS Meridian 2017.1.13. It contains a few small bug fixes and documentation enhancements, including a fix for alarm criteria browsing, pinger initialization, and sending notifications for events without associated nodes.

The codename for 2017.1.14 is Tabulae Varadienses.

  • BestMatchPingerFactory returns NullPinger when better options are available (Issue NMS-9659)
  • Alarm Dashlet CriteriaBuilder In-Restriction not working (Issue NMS-10479)
  • syslog events are creating notifications and disregarding rules in place (Issue NMS-10486)

by RangerRick at February 22, 2019 01:31 AM

OpenNMS Meridian 2016.1.18 Released

Release 2016.1.18 is a small update to 2016.1.17 that provides a few documentation updates, a query bug fix, and a fix for an issue with notifications being sent when there is no associated node.

The codename for 2016.1.18 is Strebe 1995.

  • Typo in BSFMonitor Documentation (Issue NMS-10428)
  • Alarm Dashlet CriteriaBuilder In-Restriction not working (Issue NMS-10479)
  • syslog events are creating notifications and disregarding rules in place (Issue NMS-10486)

by RangerRick at February 22, 2019 01:30 AM

January 18, 2019

OpenNMS Meridian 2018.1.4 Released

Release 2018.1.4 is an update to Meridian 2018.1.3.
It contains a number of bug fixes and a few enhancements, including a bunch of performance fixes to topology maps and a number of other smaller changes.

The codename for 2018.1.4 is Moderate breeze.

  • BestMatchPingerFactory returns NullPinger when better options are available (Issue NMS-9659)
  • When selecting a vertex which is neither visible nor in focus the ui state is stuck (Issue NMS-10451)
  • Building the menu takes forever if a visible node has an invalid ip address set (Issue NMS-10452)
  • “Use Default Focus” may not show the “add nodes manual” indicator if “getDefaults().getCriteria()” returns empty list rather than null (Issue NMS-10453)
  • Kafka Producer: Sync timing issues cause erroneous deletes (Issue NMS-10474)
  • When using the events:stress command, the node-id or interface passed as parameters are ignored when using jexl (Issue NMS-10475)
  • Alarm Dashlet CriteriaBuilder In-Restriction not working (Issue NMS-10479)
  • Performance problems with the Topology Map on large networks (Issue NMS-10369)
  • Find out why intial loading of the topology map takes so long, fix for CDP (Issue NMS-10398)
  • Apply initial loading improvements to IsIs, lldp, ospf protocols (Issue NMS-10439)
  • Allow PostgreSQL 11.x (Issue NMS-10450)
  • Support Additional EIF Protocol Version (Issue NMS-10454)
  • Meassure and improve performance of Interface loading and mapping (Issue NMS-10459)
  • Meassure and improve performance of Cdp/Lldp/IsIsElement loading (Issue NMS-10487)

by RangerRick at January 18, 2019 02:23 PM

December 17, 2018

Review: Serval WS Laptop by System76

TL;DR; When I found myself in the market for a beefy laptop, I immediately ordered the Serval WS from System76. I had always had a great experience dealing with them, but times have changed. It has been sent back.

I can’t remember the first time I heard about the Serval laptop by System76. In a world where laptops were getting smaller and thinner, they were producing a monster of a rig. Weighing ten pounds without the power brick, the goal was to squeeze a high performance desktop into a (somewhat) portable form factor.

I never thought I’d need one, as I tend to use desktops most of the time (including a Wild Dog Pro at the office) and I want a light laptop for travel as it pretty much just serves as a terminal and I keep minimal information on it.

Recently we’ve been experimenting with office layouts, and our latest configuration has me trading my office for a desk with the rest of the team, and I needed something that could be moved in case I need to get on a call, record a video or get some extra privacy.

Heh, I thought, perhaps I could use the Serval after all.

I like voting for open source with my wallet. My last two laptops have been Dell “Sputnik” systems (2nd gen and 5th gen) since I wanted to support Dell shipping Linux systems, and when we decided back in 2015 that the iMacs we used for training needed to be replaced, I ordered six Sable Touch “all in one” systems from System 76. The ordering process was smooth as silk and the devices were awesome. We still get compliments from our students.

A year later when my HP desktop died, I bought the aforementioned Wild Dog Pro. Again, customer service to match if not rival the best in the business, and I was extremely happy with my new computer.

Jump forward to the present. Since I was in the market for a “luggable” system, performance was more important than size or weight, so I ordered a loaded Serval WS, complete with the latest Intel i9 processor, 64GB of speedy RAM, NVidia 1080 graphics card, and oodles of disk space. Bwah ha ha.

When it showed up, even I was surprised at how big it was.

Serval WS and Brick

Here you can see it in comparison to a old Apple keyboard. Solidly built, I was eager to plug it in and turn it on.

Serval WS

The screen was really bright, even though so was my office at the time. You can see from the picture that it was big enough to contain a full-sized keyboard and a numeric keypad. This didn’t really matter much to me as I was planning on using it with an awesome external monitor and keyboard, but it was a nice touch. I still like having a second screen since we rely heavily on Mattermost and I always like to keep a window in view and I figured I could use the laptop screen for that.

I had ordered the system with Ubuntu installed. My current favorite operating system is Linux Mint but I decided to play with Ubuntu for a little bit. This was my first experience with Ubuntu post Unity and I must say, I really liked it. Kind of made me eager to try out Pop!_OS which is the System76 distro based on Ubuntu.

When installing Mint I discovered that I made a small mistake when placing my Serval order. I meant to use a 2TB drive as the primary leaving a 1TB drive for use by TimeShift for backup. I reversed them. No real issue, as I was able to install Mint on the 2TB drive just fine after some creative partition manipulation.

Everything was cool until late afternoon when the sun went away. I was rebooting the system and found myself looking at a blank screen (for some reason the screen stays blank for a minute or so after powering on the laptop, I assume due to it having 64GB of RAM). There was a tremendous amount of “bleed” around the edges of the LCD.

Serval WS LCD Bleed


Although it probably wouldn’t have impacted me much in day to day use, especially with an external monitor, I would know about it, and as I’m somewhere on the OCD spectrum it would bother me. Plus I paid a lot of money for this system and want it to be as close to perfect as possible.

For those of you who don’t know, the liquid crystals in LCD displays emit no light of their own and they get their illumination usually from a fluorescent source. If there are issues with the way the LCD panel is constructed, this light can “bleed” around the edges and degrade the display quality (it is also why it is hard to get really black images on LCD displays and this is fueling a lot of the excitement around OLED technology).

I’ve had issues with this before on laptops but nothing this bad. Not to worry, I have System76 to rely on, along with their superlative customer service.

I called the number and soon I was speaking with a support technician. When I described the problem they opened a ticket and asked me to send in a picture. I did and then waited for a response.

And waited.

And waited.

I commented on the ticket.

And I continued to wait.

The next day I waited a bit (Denver is two hours behind where I live) but when I got no response I decided, well, I’ll just return the thing. I called to get an RMA number but this time I wasn’t connected with a person and was asked to leave a message. I did, and I should note that I never got that return call.

At this point I’m frustrated, so I decided an angry tweet was in order. That got a response to my ticket, where they offered to send me a new unit.

Yay, here was a spark of the customer service I was used to getting. I’ve noticed a number of tech companies are willing to deal with defective equipment by sending out a new unit before the old unit is returned. In this day and age of instant gratification it is pretty awesome.

I wrote back that I was willing to try another unit, but would it be possible to put Pop!_OS on the new unit on the 2TB drive so that I could try it out of the box and know that all of the System76 specific packages were installed.

A little while later I got a reply that it wouldn’t be possible to install it on the 2TB drive, so I would end up having to reinstall in any case.


When I complained on Twitter I was told “Sorry to hear this, you’ll receive a phone call before EOD to discuss your case.” I worked until 8pm that night with no phone call, so I just decided to return the thing.

Of course, this would be at my expense and the RMA instructions were strict about requiring shipping insurance: “System76 cannot refund your purchase if the machine arrives damaged. For this reason, it is urgent that you insure your package”. The total cost was well over $100.

So I’m out a chunk of change and I’ve lost faith in a vendor of which I was extremely fond. This is a shame since they are doing some cool things such as building computers in the United States, but since they’ve lost sight of what made them great in the first place I have doubts about their continued success.

In any case, I ordered a Dell Precision 5530, which is one of the models available with Ubuntu. Much smaller and not as powerful as the Serval WS, it is also not as expensive. I’ll post as review in a couple of weeks when I get it.

by Tarus at December 17, 2018 03:36 PM

December 15, 2018

#OSMC 2018 – Day 3: Hackathon

For several years now the OSMC has been extended by one day in the form of a “hackathon”. As I do not consider myself a developer I usually skip this day, but since I wanted to spend more time with Ronny Trommer and to explore the OpenNMS MQTT plugin, I decided to attend this year.

I’m glad I did, especially because the table where we sat was also home to Dave Kempe, and he brought Tim Tams from Australia:

OSMC 2018 Tim Tams


You can find them in the US on occasion, but they aren’t as good.

I have been hearing about MQTT for several years now. According to Wikipedia, MQTT (Message Queuing Telemetry Transport) is a messaging protocol designed for connections with remote locations where a “small code footprint” is required or the network bandwidth is limited, thus making it useful for IoT devices.

Dr. Craig Gallen has been working on a plugin to allow OpenNMS to consume MQTT messages, and I was eager to try it out. First, we needed a MQTT broker.

I found that the OpenHAB project supports an MQTT broker called Mosquitto, so we decided to go with that. This immediately created a discussion about the differences between OpenHAB and Home Assistant, the latter being a favorite of Dave. They looked comparable, but we decided to stick with OpenHAB because a) I already had an instance installed on a Raspberry Pi, and b) it is written in Java, which is probably why others prefer Home Assistant.

Ronny worked on getting the MQTT plugin installed while I created a dummy sensor in OpenHAB called “Gas”.

OSMC 2018 Hackathon

This involved creating a “sitemap” in /etc/openhab2:

sitemap opennms label="My home automation" {
    Frame label="Date" {
        Text item=Date
    Frame label="Gas" {
        Text item=mqtt_kitchen_gas icon="gas"

and then an item that we could manipulate with MQTT:

Number mqtt_kitchen_gas "Gas Level [%.1f]" {mqtt="<[mosquitto:Home/Floor1/Kitchen/Gas_Sensor:state:default]"}

To install the MQTT plugin:

Ronny added the following to the configuration to connect to our Mosquitto broker on OpenHAB:

  <client clientinstanceid="client1">
      <topic qos="0" topic="iot/#">

Now that we had a connection between our OpenHAB Mosquitto broker and OpenNMS, we could try to send information. The MQTT plugin handles both event information and data collection. To test both we used the mosquitto_pub command on the CLI.

For an event one can use something like this:

mosquitto_pub -u openhabian --pw openhabian -t "iot/timtam" -m "{ \"name\": \"6114163\",  \"sensordatavalues\": [ { \"value_type\": \"Gas\", \"value\": \"$RANDOM\"  } ] }"

On the OpenNMS side you need to configure the MQTT plugin to look for it:

  <messageEventParser foreignSource="$topicLevels[5]" payloadType="JSON" compression="UNCOMPRESSED">

    <xml-groups xmlns="">
      <xml-group name="timtam-mqtt-lab" resource-type="sensors" resource-xpath="/" key-xpath="@name">
        <xml-object name="instanceResourceID" type="string" xpath="@name"/>
        <xml-object name="gas" type="gauge" xpath="sensordatavalues[@value_type="Gas"]/value"/>

Note how Ronny worked our Tim Tam obsession into the configuration.

To make this useful, you would want to configure an event definition for the event with the Unique Event Identifier (UEI) of

<events xmlns="">
    <event-label>MQTT: Timtam kitchen lab event</event-label>
    <descr>This is our Timtam kitchen lab event</descr>
    <logmsg dest="logndisplay">
      All the parameters: %parm[all]%
    <alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%:%service%" alarm-type="1" auto-clean="false"/>

Once we had that working, the next step was to use the MQTT plugin to collect performance data from the messages. We used this script:

while [ true ]
mosquitto_pub -u openhabian --pw openhabian -t "Home/Floor1/Kitchen/Gas_Sensor" -m "{ \"name\": \"6114163\",  \"sensordatavalues\": [ { \"value_type\": \"Gas\", \"value\": \"$RANDOM\"  } ] }"
sleep 10

This will create a message including a random number every ten seconds.

To have OpenNMS look for it, the MQTT configuration is:

  <messageDataParser foreignSource="$topicLevels[5]" payloadType="JSON" compression="UNCOMPRESSED">
    <xml-groups xmlns="">
      <xml-group name="timtam-kitchen-sensor" resource-type="sensors" resource-xpath="/" key-xpath="@name">
        <xml-object name="instanceResourceID" type="string" xpath="@name" />
        <xml-object name="gas" type="gauge" xpath="sensordatavalues[@value_type="Gas"]/value"/>
    <xmlRrd step="10">

This will store the values in an RRD file which can then be graphed within OpenNMS or through Grafana with the Helm plugin.

It was pretty straightforward to get the OpenNMS MQTT plugin working. While I’ve focused mainly on what was accomplished, it was a lot of fun chatting with others at our table and in the room. As usual, Netways did a great job with organization and I think everyone had fun.

Plus, I got to be reminded of all the amazing stuff being done by the OpenNMS team, and how the view is great up here while standing on the shoulders of giants like Ronny and Craig.

by Tarus at December 15, 2018 05:21 PM

December 13, 2018

Come Discourse With OpenNMS!

Ulf and Discourse The OpenNMS project was registered on 30th March 2000 on SourceForge and in these early days everyone talked about CVS or Subversion. Benjamin Reed migrated the project very early to git and this was a big change for our community, but it paid off so well. It made it also pretty easy to move our project from SourceForge to GitHub.

At the time we had a IRC channel hosted on freenode. We tried Slack but we love Open Source and ultimately went with Mattermost. By using an IRC bridge we hav...

December 13, 2018 09:42 AM

December 07, 2018

OpenNMS.js v1.3.0

This release contains a number of new features and a few bug fixes, including support for correlation alarms and feedback, and additional metadata for Helm 3.

Bug Fixes

  • api: support flow data (bd8c5b9)
  • feedback: Serialize the enum as a string (3c2f997)


  • alarms: HELM-110: store managedObject* if present (2b86722)
  • alarms: HELM-114: alarm property for whether it is a situation (a54b627)
  • api: add flow and situation metadata APIs (42d7a58)
  • api: Add more test assertions for AlarmSummaryDTO (0f45e79)
  • api: Add test for AlarmSummary and reductionKey (3329cf7)
  • api: HZN-1357 expose FeedbackDAO (d354040)
  • api: HZN-1357 Pass data and set accept header (0bf0bdd)
  • api: HZN-1357 tests for uri.js (e98cfa4)
  • api: HZN-1357 use AlarmId for Situation (3d9f172)
  • api: Initial work (fa62503)
  • api: Move from 'impacts/causes' to 'relatedAlarms' (3bc5b6e)
  • api: OCE-REST extend Alarm and summary (e855134)
  • api: OCE-REST extend Alarm and summary (24ca0f7)
  • api: OCE-REST remove inSituation attr (d3bb9e9)
  • api: OCE-REST updte tests (abd86d1)
  • cli: improve table rendering (da92bdb)
  • feedback: Expose the feedback type enum values (500632a)

by RangerRick at December 07, 2018 04:37 PM

November 20, 2018

#OSMC 2018 – Day 2

Despite how long the Tuesday night festivities lasted, quite a few people managed to make the first presentation on Wednesday morning. I’m old so I had gone to bed fairly early and was able to see “Make IT Monitoring Ready for Cloud-native Systems” bright and early.

OSMC 2018 RealOpInsight

This presentation focused on a project called RealOpInsight. This seems to be a sort of “Manager of Managers” for multiple monitoring applications, and I didn’t really see a “cloud-native” focus in the presentation. It is open-source so if you find yourself running many instances of disparate monitoring platforms you may find RealOpInsight useful.

This was followed by a presentation from Uber.

OSMC 2018 Uber

One can imagine the number of metrics an organization like Uber collects (and I did refrain myself from making snarky comments like “what database do you use to track celebrities?” and “where do you count the number of assaults by Uber drivers?”). Rob Skillington seemed pretty cool and I didn’t want to put him on the spot.

Uber used to use Cassandra, which is a storage option for OpenNMS, but they found when they hit around 80,000 metrics per second the system couldn’t keep up (one of the largest OpenNMS deployments is 20,000 metrics/sec so 80K is a lot). Their answer was to create a new storage system called M3DB. While it seems pretty impressive, I did ask some questions about how mature it was because at OpenNMS we are always looking out for ways to make things easier for our users, and Rob admitted that while it works well for Uber it needs some work to be generally useful, which is why they open-sourced it. We’ll keep an eye on it.

The next time slot was the “German only” one I mentioned in my last post, so I engaged in the hallway track until lunch.

OSMC 2018 Rihards Olups

It was lovely to see Rihards Olups again. We met at the first OSMC I attended when he was part of the “Latvian Army” at Zabbix. He gave an entertaining talk on dealing with the alerts from your monitoring system, and he ended with the tag line “Make Alerts Meaningful Again (MAMA)”. Seems like a perfect slogan for a ball cap, preferably in red.

OSMC 2018 Dave Kempe

Another delightful human being I got to see was Dave Kempe, who came all the way from Sydney. While we had met at a prior OSMC, this conference we ended up spending a lot more time together (he was in the Prometheus training as well as the Thursday Hackathon). He gave a talk on being a monitoring consultant, and it was interesting to compare his experiences with my own (they were similar).

For most people the conference ended on Wednesday. I said goodbye to people like Peter Eckel and looked forward to the next OSMC so I could see them again.

Speaking of the next OSMC, we are going to be doing OpenNMS training on that first day, November 4th, so save the date. It is the least we could do since they went to the trouble to advertise OpenNMS Horizon® on all their posters (grin).

OSMC 2018 Horizon

Ronny and I were hanging around for the Hackathon on Thursday, and for those attendees there was a nice dinner at a local restaurant called Tapasitos. It was fun to spend more time with the OSMC gang and to get ready for our last day at the conference.

OSMC 2018 Tapasitos

by Tarus at November 20, 2018 04:47 PM

November 16, 2018

#OSMC 2018 – Day 1

The 2018 Open Source Monitoring Conference officially got started on Tuesday. This was my fifth OSMC (based on the number of stars on my badge), although I am happy to have been at the very first OSMC conference with that name.

As usual our host and Master of Ceremonies Bernd Erk started off the festivities.

OSMC 2018 Welcome

This year there were three tracks of talks. Usually there are two, and I’m not sure how I feel about more tracks. Recently I have been attending Network Operator Group (NOG) meetings and they are usually one or two days long but only one track. I like that, as I get exposed to things I normally wouldn’t. One of my favorite open source conferences All Things Open has gotten so large that it is unpleasant to navigate the schedule.

In the case of the OSMC, having three tracks was okay, but I still liked the two track format better. One presentation was always in English, although one of the first things Bernd mentioned in his welcome was that Mike Julian was unable to make it for his talk on Wednesday and thus that time slot only had two German language talks.

If they seem interesting I’ll sit in on the German talks, especially if Ronny is there to translate. I am very interested in open source home automation (well, more on the monitoring side than, say, turning lights on and off) so I went to the OpenHAB talk by Marianne Spiller.

OSMC 2018 OpenHAB

I found out that there are mainly two camps in this space: OpenHAB and Home Assistant. The former is in Java which seems to invoke some Java hate, but since I was going to use OpenHAB for our MQTT Hackathon on Thursday I thought I would listen in.

OSMC 2018 Custom MIB

I also went to a talk on using a Python library for instrumenting your own SNMP MIB by Pieter Hollants. We have a drink vending machine that I monitor with OpenNMS. Currently I just output the values to a text file and scrape them via HTTP, but I’d like to propose a formal MIB structure and implement it via SNMP. Pieter’s work looks promising and now I just have to find time to play with it.

Just after lunch I got a call that my luggage had arrived at the hotel. Just in time because otherwise I was going to have to do my talk in the Icinga shirt Bernd gave me. Can’t have that (grin).

My talk was lightly attended, but the people who did come seemed to enjoy it. It was one of the better presentations I’ve created lately, and the first comment was that the talk was much better than the title suggested. I was trying to be funny when I used “OpenNMS Geschäftsbericht” (OpenNMS Annual Report) in my submission. It’s funny because I speak very little German, although it was accurate since I was there to present on all of the cool stuff that has happened with OpenNMS in the past year. It was recorded so I’ll post a link once the videos are available.

In contrast, Bernd’s talk on the current state of Icinga was standing room only.

OSMC 2018 State of Icinga

The OSMC has its roots in Nagios and its fork Icinga, and most people who come to the OSMC are there for Icinga information. It is easy to why this talk was so popular (even though it was basically “Icinga Geschäftsbericht” – sniff). The cool demo was an integration Bernd did using IBM’s Node-RED, Telegram and an Apple Watch, but unfortunately it didn’t work. I’m hoping we can work up an Apple Watch/OpenNMS integration by next year’s conference (should be possible to add hooks to the Watch from the iOS version of Compass).

The evening event was held at a place called Loftwerk. It was some distance from the conference so a number of buses were chartered to take us there. It was fun if a bit loud.

OSMC 2018 Loftwerk

OSMC celebrations are known to last into the night. The bar across the street from the conference hotel (which I believe has changed hands at least three times in the lifetime of the OSMC) becomes “Checkpoint Jenny” once the main party ends and can go on until nearly dawn, which is why I like to speak on the first day.

by Tarus at November 16, 2018 05:03 PM

November 15, 2018

OpenNMS Meridian 2018.1.3 Released

Release 2018.1.3 is an update to Meridian 2018.1.2. It contains a number of bug fixes and a few enhancements, including additional HTTP proxy support, reliability updates, and UI performance improvements.

The codename for 2018.1.3 is Gentle breeze.

  • Other classes that use Http (Issue NMS-10379)
  • Sink API drops messages when there is no connectivity with Kafka (Issue NMS-10395)
  • Discovery UI should not allow selection of Minions as Foreign Source (Issue NMS-10400)
  • Find out why selecting a node takes so long in a big topology (Issue NMS-10419)
  • Typo in BSFMonitor Documentation (Issue NMS-10428)
  • Default Metaspace configuration is insufficient (Issue NMS-10437)
  • Improve performance of node search (Issue NMS-10445)
  • Change eventconf for newSuspect to include location name in logmsg (Issue HZN-814)
  • Be able to use Proxy for any Monitor or Collector that uses HttpClient (Issue NMS-9710)
  • Detect and Attempt to Restart Failed Drools Engines (Issue NMS-10363)

by RangerRick at November 15, 2018 09:04 PM

OpenNMS Meridian 2017.1.13 Released

Release 2017.1.13 is a minor update to OpenNMS Meridian 2017.1.12. It contains a documentation fix and an enhancement to Drools engine stability.

The codename for 2017.1.13 is Cadiz meridian.

  • Typo in BSFMonitor Documentation (Issue NMS-10428)
  • Detect and Attempt to Restart Failed Drools Engines (Issue NMS-10363)

by RangerRick at November 15, 2018 08:33 PM

November 07, 2018

#OSMC 2018 – Day 0: Prometheus Training

To most people, monitoring is not exciting, but it seems lately that the most exciting thing in monitoring is the Prometheus project. As a project endorsed by the Cloud Native Computing Foundation, Prometheus is getting a lot of attention, especially in the realm of cloud applications and things like monitoring Kubernetes.

At this year’s Open Source Monitoring Conference they offered a one day training course, so I decided to take it to see what all the fuss was about. I apologize in advance that a lot of this post will be comparing Prometheus to OpenNMS, but in case you haven’t guessed I’m biased (and a bit jealous of all the attention Prometheus is getting).

The class was taught by Julien Pivotto who is both a Prometheus user and a decent instructor. The environment consisted of 15 students with laptops set up on a private network to give us something to monitor.

Prometheus is written in Go (I’m never sure if I should call it “Go” or if I need to say “Golang”) which makes it compact and fast. We installed it on our systems by downloading a tarball and simply executing the application.

Like most applications written in the last decade, the user interface is accessed via a browser. The first thing you notice is that the UI is incredibly minimal. At OpenNMS we get a lot of criticism of our UI, but the Prometheus interface is one step above the Google home page. The main use of the web page is for querying collected metrics, and a lot of the configuration is done by editing YAML files from the command line.

Once Prometheus was installed and running, the first thing we looked at was monitoring Prometheus itself. There is no real magic here. Metrics are exposed via a web page that simply lists the variables available and their values. The application will collect all of the values it finds and store them in a time series database called simply the TSDB.

The idea of exposing metrics on a web page is not new. Over a decade ago we at OpenNMS were approached by a company that wanted us to help them create an SNMP agent for their application. We asked them why they needed SNMP and found they just wanted to expose various metrics about their app to monitor its performance. Since it ran on Linux system with an embedded web server, we suggested that they just write the values to a file, put that in the webroot, and we would use the HTTP Collector to retrieve and store them.

The main difference between that method and Prometheus is that the latter expects the data to be presented in a particular format, whereas the OpenNMS method was more free-form. Prometheus will also collect all values presented without extra configuration, whereas you’ll need to define the values of interest within OpenNMS.

In Prometheus there is no real auto-discovery of devices. You edit a file in which you create a “job”, in our case the job was called “Prometheus”, and then you add “targets” based on IP address and port. As we learned in the class, for each different source of metrics there is usually a custom port. Prometheus stats are on port 9100, node data is exposed on 9090 via the node_exporter, etc. When there is an issue, this can be reflected in the status of the job. For example, if we added all 15 Prometheus instances to the job “Prometheus” and one of them went down, then the job itself would show as degraded.

After we got Prometheus running, we installed Grafana to make it easier to display the metrics that Prometheus was capturing. This is a common practice these days and a good move since more and more people are becoming familiar it. OpenNMS was the first third-party datasource created for Grafana, and the Helm application brings bidirectional functionality for managing OpenNMS alarms and displaying collected data.

After that we explored various “components” for Prometheus. While a number of applications are exposing their data in a format that Prometheus can consume, there are also other components that can be installed, such as the node_exporter which displays server-related metrics and to provide data that isn’t otherwise natively available.

The rest of the class was spent extending the application and playing with various use cases. You can “federate” Prometheus to aggregate some of the collected data from multiple instance under one, and you can separate out your YAML files to make them easier to read and manage.

The final part of the class was working with the notification component called the “alertmanager” to trigger various actions based on the status of metrics within the system.

One thing I wish we could have covered was the “push” aspect of Prometheus. Modern monitoring is moving from a “pull” model (i.e. SNMP) to a “push” model where applications simply stream data into the monitoring system. OpenNMS supports this type of monitoring through the telemetryd feature, and it would be interesting to see if we could become a sink for the Prometheus push format.

Overall I enjoyed the class but I fail to see what all the fuss is about. It’s nice that developers are exposing their data via specially formatted web pages, but OpenNMS has had the ability to collect data from web pages for over a decade, and I’m eager to see if I can get the XML/JSON collector to work with the native format of Prometheus. Please don’t hate on me if you really like Prometheus – it is 100% open source and if it works for you then great – but for something to manage your entire network (including physical servers and especially networking equipment like routers and switches) you will probably need to use something else.

[Note: Julien reached out to me and asked that I mention the SNMP_Exporter which is how Prometheus gathers data from devices like routers and switches. It works well for them and they are actively using it.]

by Tarus at November 07, 2018 07:04 AM

November 05, 2018

#OSMC 2018 – Day -1

The annual Open Source Monitoring Conference (OSMC) held in Nürnberg, Germany each year brings together pretty much everyone who is anyone in the free and open source monitoring space. I really look forward to attending, and so do a number of other people at OpenNMS, but this year I won the privilege, so go me.

The conference is a lot of fun, which must be the reason for the hell trip to get here this year. Karma must be trying to bring things into balance.

As an American Airlines frequent flier whose home airport is RDU, most of my trips to Europe involve Heathrow airport (American has a direct flight from RDU to LHR that I’ve taken more times than I can count).

I hate that airport with the core of my being, and try to avoid it whenever possible. While I could have taken a flight from LHR directly to Nürnberg on British Airways, I decided to fly to Philadelphia and take a direct American flight to Munich. It is just about two hours by train from MUC to Nürnberg Hbf and I like trains, so combine that with getting to skip LHR and it is a win/win.

But it was not to be.

I got to the airport and watched as my flight to PHL got delayed further and further. Chris, at the Admiral’s Club desk, was able to re-route me, but that meant a flight through Heathrow (sigh). Also, the Heathrow flight left five hours later than my flight to Philadelphia, and I ended up waiting it out at the airport (Andrea had dropped me off and I didn’t want to ask her to drive all the way back to get me just for a couple of hours).

Because of the length of this trip I had to check a bag, and I had a lot of trepidation that my bag would not be re-routed properly. Chris even mentioned that American had actually put it on the Philadelphia flight but he had managed to get it removed and put on the England flight, and American’s website showed it loaded on the plane.

That also turns out to be the last record American has on my bag, at least on the website I can access.

American Tracking Website

The fight to London was uneventful. American planes tend to land at Terminal 3 and most other British Airways planes take off from Terminal 5, so you have to make your way down a series a long corridors and take a bus to the other terminal. Then you have to go through security, which is usually when my problems begin.

I wear contact lenses, and since my eyes tend to react negatively to the preservatives found in saline solution I use a special, preservative-free brand of saline. Unfortunately, it is only available in 118ml bottles. As most frequent fliers know, the limit for the size of liquid containers for carry on baggage is 100ml, although the security people rarely notice the difference. When they do I usually just explain that I need it for my eyes and I’m allowed to bring it with me. That is, everywhere except Heathrow airport. Due to the preservative-free nature of the saline I can’t move it to another container for fear of contamination.

Back in 2011 was the first time that my saline was ever confiscated at Heathrow. Since then I’ve carried a doctor’s note stating that it is “medically necessary” but once even then I had it confiscated a few years later at LHR because the screener didn’t like the fact that my note was almost a year old. That said, many times have I gone through that airport with no one noticing the slightly larger size of my saline bottle, but on this trip it was not to be.

When your carry on items get tagged for screening at Heathrow’s Terminal 5, you kind of wait in a little mob of people for the one person to methodically go through your stuff. Since I had several hours between flights it was no big deal for me, but it is still very annoying. Of course when the screener got to my items he was all excited that he had stopped the terrorist plot of the century by discovering my saline bottle was 18ml over the limit, and he truly seemed disappointed when I produced my doctor’s note, freshly updated as of August of this year.

Screeners at Heathrow are not imbued with much decision making ability, so he literally had to take my note and bottle to a supervisor to get it approved. I was then allowed to take it with me, but I couldn’t help thinking that the terrorists had won.

The rest of my stay at the world’s worst airport was without incident, and I squeezed into my window seat on the completely full A319 to head to Munich.

One we landed I breezed through immigration (Germans run their airports a bit more efficiently than the British) and waited for my bag. And waited. And waited.

When I realized it wouldn’t be arriving with me, I went to look for a BA representative. The sign said to find them at the “Lost and Found” kiosk, but the only two kiosks in the rather small baggage area were not staffed. I eventually left the baggage area and made my way to the main BA desk, where I managed to meet Norbert. After another 15 minutes or so, Norbert brought me a form to fill out and promised that I would receive an e-mail and a text message with a “file number” to track the status of my bag.

I then found the S-Bahn train which would take me to the Munich Hauptbahnhof where I would get my next train to Nürnberg.

I had made a reservation for the train to insure I had a seat, but of course that was on the 09:55 train which I would have taken had I been on the PHL flight. I changed that to a 15:00 train when I was rerouted, and apparently one change is all you get with Deutsche Bahn, but Ronny had suggested I buy a “flexpreis” ticket so I could take any train from Munich to Nürnberg that I wanted. I saw there were a number of “Inter-City Express (ICE)” trains available, so I figured I would just hop on the first one I found.

When I got to the station I saw that a train was leaving from Platform (Gleis) 20 at 15:28. It was now 15:30 so I ran and boarded just before it pulled out of the station.

It was the wrong train.

Well, not exactly. There are a number of types of trains you can take. The fastest are the ICE trains that run non-stop between major cities, but there are also “Inter-City (IC)” trains that make more stops. I had managed to get on a “Regional Bahn (RB)” train which makes many, many stops, turning my one hour trip into three.


The man who took my ticket was sympathetic, and told me to get off at Ingolstadt and switch to an ICE train. I was chatting on Mattermost with Ronny most of this time, and he was able to verify the proper train and platform I needed to take. That train was packed, but I ended up sitting with some lovely people who didn’t mind chatting with me in English (I so love visiting Germany for this reason).

So, about seven hours later than I had planned I arrived at my hotel, still sans luggage. After getting something to eat I started the long process of trying to locate my bag.

I started on Twitter. Both the people at American and British Airways asked me to DM them. The AA folks said I needed to talk with the BA folks and the BA folks still have yet to reply to me. Seriously BA, don’t reach out to me if you don’t plan to do anything. It sets up expectations you apparently can’t meet.

Speaking of not doing anything, my main issue was that I need a “file reference” in order to track my lost bag, but despite Norbert’s promise I never received a text or e-mail with that information. I ended up calling American, and the woman there was able to tell me that she showed the bag was in the hands of BA at LHR. That was at least a start, so she transferred me to BA customer support, who in turn transferred me to BA delayed baggage, who told me I needed to contact American.


As calmly as I could, I reiterated that I started there, and then the BA agent suggested I visit a particular website and complete a form (similar to the one I did for Norbert I assume) to get my “file reference”. After making sure I had the right URL I ended the call and started the process.

I hit the first snag when trying to enter in my tag number. As you can see from the screenshot above, my tag number starts with “600” and is ten digits long. The website expected a tag number that started with “BA” followed by six digits, so my AA tag was not going to work.

BA Tracking Website - wrong number

But at least this website had a different number to call, so I called it and explained my situation once again. This agent told me that I should have a different tag number, and after looking around my ticket I did find one in the format they were after, except starting with “AA” instead of “BA”. Of course, when I entered that in I got an error.

BA Tracking Website - error

After I explained that to the agent I remained on the phone for about 30 minutes until he was able to, finally, give me a file reference number. At this point I was very tired, so I wrote it down and figured I would call it a night and go to sleep.

But I couldn’t sleep, so I tried to enter that number into the BA delayed bag website. It said it was invalid.


Then I got a hint of inspiration and decided to enter in my first name as my last, and voila! I had a missing bag record.

BA Tracking Website - missing bag

That site said they had found my bag (the agent on the phone had told me it was being “traced”) and it also asked me to enter in some more information about it, such as the brand of the manufacturer.

BA Tracking Website - information required

Of course when I tried to do that, I got an error.

BA Tracking Website - system error

Way to go there, British Airways.

Anyway, at that point I could sleep. As I write this the next morning nothing has been updated since 18:31 last night, but I hold out hope that my bag will arrive today. I travel a lot so I have a change a clothes with me along with all the toiletries I need to not offend the other conference attendees (well, at least with my hygiene), but I can’t help but be soured on the whole experience.

This year I have spent nearly US$20,000 with American Airlines (they track that for me on their website). I paid them for this ticket and they really could have been more helpful instead of just washing their hands and pointing their fingers at BA. British Airways used to be one of the best airlines on the planet, but lately they seemed to have turned into Ryanair but without that airline’s level of service. The security breach that exposed the personal information of their customers, stories like this recent issue with a flight from Orlando, and my own experience this trip have really put me off flying them ever again.

Just a hint BA – from a customer service perspective – when it comes to finding a missing bag all we really want (well, besides the bag) is for someone to tell us they know where it is and when we can expect to get it. The fact that I had to spend several hours after a long trip to get something approximating that information is a failure on your part, and you will lose some if not all of my future business because of it.

I also made the decision to further curtail my travel in 2019, because frankly I’m getting too old for this crap.

So, I’m now off to shower and to get into my last set of clean clothes. Here’s hoping my bag arrives today so I can relax and enjoy the magic that is the OSMC.

by Tarus at November 05, 2018 07:23 AM

November 02, 2018

Send notifications with Signal

In some cases it is nice to have notifications from OpenNMS in a separate channel on a smartphone and you don't want to pay for SMS. Here is a tutorial where I use Signal using the signal-cli.

This Howto will describe how to download the latest signal-cli tool, link it to your existing Signal account and how to configure OpenNMS to use it as a notification target. You should have already an OpenNMS Horizon or Meridian running and you need a Signal account with the Signal app installed and...

November 02, 2018 01:01 PM

October 26, 2018

CarbonROM Install on Pixel XL (marlin)

I am still playing around with alternate ROMs for Android devices, and I recently came across CarbonROM. I had some issues getting it installed (more due to me than the ROM itself) and so I thought I’d post my steps here.

I was looking for a ROM that focused on stability and security, and Carbon seems to fit the bill.

While I have a lot of experience playing with ROMs, I hadn’t really done it on handsets with “Seamless Update“. In this case there are two “slots”, Slot A and Slot B, and this can cause a challenge when installing a new operating system. This procedure worked for me (with help from Christian Oder via the CarbonROM community on Google+).

  1. Install latest 8.1 Factory Image

    This may not be required, but since I ran into issues I went ahead and installed the latest “oreo” factory image. I had already upgraded the phone to Android 9 (pie) and thought that might have caused the problems I was having, but I don’t think that was the case.

  2. Unlock the bootloader

    This is not meant to be a tutorial installing alternative ROMs, but basically you go to Settings -> System and then locate the build number. Click on that a number of times until you have enabled “developer mode” then go to the developer options and unlock the bootloader and enable the ability to access the device over USB. Then boot into the bootloader and run “fastboot flashing unlock” and follow the prompts on the screen.

  3. Boot to TWRP using image

    In order to install an alternative ROM it helps to have a better Recovery than stock. I really like TWRP and pretty much just followed the instructions. Using the Android Debugger (adb) you boot into the bootloader and run TWRP from an image file.

  4. Install TWRP zip

    Once you are running TWRP, install it into the boot partition from the .zip file. Use “adb push” to put the .zip file on the /sdcard/ partition.

  5. Reboot to Recovery (to make sure TWRP still works)
  6. Factory reset and erase /system

    Go to “Wipe” and do a factory reset, and then “Advanced Wipe” to nuke the system partition.

    You will also want to erase user data at this point. Once I got Carbon to boot it still asked me for a password which I assumed was the one I set up in the original factory install (you have to get into the factory image to unlock the bootloader). I went back and erased all of the user data and that did what I expected, so you might want to do this at this step.

  7. Install Carbon

    Use “adb push” to send the latest Carbon zip file to the /sdcard/. Install using TWRP.

    This is the point where my issues started. The next step is to reboot back into recovery. You have to do this so that the other Slot gets overwritten with the new operating system. However, with the Carbon install TWRP was overwritten and that hung the device when I tried to reboot into recovery, so

  8. Re-install TWRP

    Use “adb push” to load the TWRP .zip file again and install it while you are still in TWRP, then

  9. Reboot to recovery

    This should get Carbon all happy on your device as it will be copied over into the other Slot. If you try to boot into the system before doing this bad things will happen. (grin)

  10. Install GApps (optional)

    Now, if you want Google applications you need to install a GApps package. I like Open GApps and so I installed the “pico” package. One thing I am experimenting with here is seeing if I can use a minimal amount of Google software without giving Google my entire digital life. The pico package includes just enough to run the Google Play Store.

    This is optional, and if you just want to run, say, F-Droid apps, you can skip this step, but note I’ve been told that you can’t add GApps later, so if you want it, install it now.

  11. Reboot into the System

If everything went well, you should see the Carbon boot screen and eventually get dropped into the “Welcome to Android” Google sign up wizard. Follow the prompts (I turn off almost everything but location services) and then you should be running CarbonROM with a minimal amount of Google-ness.

The first thing I tried out was “Pokémon Go“. Due to people cheating by spoofing their GPS coordinates, Pokémon Go leverages features of Android to detect if people are running an altered operating system. I’ve found that on some ROMs the application will not work. It worked fine on Carbon and so I’m hoping I can add just a few more “Google” things, like Maps, and then use F-Droid for everything else.

Note that I didn’t “root” my operating system. When you boot into TWRP you can access the entire device with root privileges so I never feel the need to have root while I’m running the device. Seems to be a good security practice and it also allows me to still run Pokémon Go.

Many thanks to the CarbonROM team for working on this. I’m eager to see how soon security updates are released as well as what they do with Android 9, but it looks promising.

by Tarus at October 26, 2018 02:58 PM

October 18, 2018

OpenNMS Meridian 2018.1.2 Released

Release 2018.1.2 is an update to Meridian 2018.1.1. It contains a number of bug fixes and a few enhancements, including improvements to VMware connection pooling.

The codename for 2018.1.2 is Light breeze.

  • Wrong data type for certain Cassandra JMX counters (Issue NMS-10352)
  • Cannot override TTL when running the Karaf Command collections:collect through Minions (Issue NMS-10367)
  • Erroneous INFO-level log messages during every forced node rescan (Issue NMS-10370)
  • Wrong JMX MBeans for minions (Issue NMS-10372)
  • doesn’t understand newer JDK output (Issue NMS-10401)
  • int overflow in InstallerDb causes bamboo failures (Issue NMS-10402)
  • Be able to use Proxy for any Monitor or Collector that uses HttpClientWrapper directly (Issue NMS-10312)
  • Be able to use Proxy for any Monitor or Collector that uses HttpClient via UrlFactory (Issue NMS-10313)
  • Improve concurrency in Vmware Connection Pool (Issue NMS-10373)

by RangerRick at October 18, 2018 06:17 PM

October 08, 2018


I love tech conferences, especially when I get to be a speaker. Nothing makes me happier than to be given a platform to run my mouth.

For the last year or so I’ve been attending various Network Operators Group (NOG) meetings, and I recently got the opportunity to speak at the UK version, which they refer to as a Network Operators Forum (UKNOF). It was a lot of fun, so I thought I’d share what I learned.

UKNOF41 was held in Edinburgh, Scotland. I’d never been to Scotland before and I was looking forward to the visit, but Hurricane Florence required me to return home early. I ended up spending more time in planes and airports than I did in that city, and totally missed out on both haggis and whisky (although I did drink an Irn-Bru). I arrived Monday afternoon and met up with Dr. Craig Gallen, the OpenNMS Project representative in the UK. We had a nice dinner and then got ready for the meeting on Tuesday.

Like most NOG/NOF events, the day consisted of one track and a series of presentations of interest to network operators. I really like this format. The presentations tend to be relatively short and focused, and this exposes you to concepts you might have missed if there were multiple tracks.

UKNOF is extremely well organized, particularly from a speaker’s point of view. There was a ton of information on what to expect and how to present your slides, and everything was run from a single laptop. While this did mean your slides were due early (instead of, say, being written on the plane or train to the conference) it did make the day flow smoothly. The sessions were recorded, and I’ll include links to the presentations and the videos in the descriptions below.

UKNOF41 - Keith Mitchell

The 41st UKNOF was held at the Edinburgh International Conference Centre, located in a newer section of the city and was a pretty comfortable facility in which to hold a conference. Keith Mitchell kicked off the the day with the usual overview of the schedule and events (slides), and then we got right into the talks.

UKNOF41 - Kurtis Lindqvist

The first talk was from Kurtis Lindqvist who works for a service provider called LINX (video|slides). LINX deployed a fairly new technology called EVPN (Ethernet VPN). EVPN is “a multi-tenant BGP-based control plane for layer-2 (bridging) and layer-3 (routing) VPNs. It’s the unifying L2+L3 equivalent of the traditional L3-only MPLS/VPN control plane.” I can’t say that I understood 100% of this talk, but the gist is that EVPN allows for better use of available network resources which allowed LINX to lower its prices, considerably.

UKNOF41 - Neil McRae

The next talk was from Neil McRae from BT (video|slides). While this was my first UKNOF I quickly identified Mr. McRae as someone who is probably very involved with the organization as people seemed to know him. I’m not sure if this was in a good way or a bad way (grin), probably a mixture of both, because being a representative from such a large incumbent as BT is bound to attract attention and commentary.

I found this talk pretty interesting. It was about securing future networks using quantum key distribution. Current encryption, such as TLS, is based on public-key cryptography. The security of public-key cryptography is predicated on the idea that it is difficult to factor large numbers. However, quantum computing promises several orders of magnitude more performance than traditional binary systems, and the fear is that at some point in the future the mathematically complex operations that make things like TLS work will become trivial. This presentation talked about some of the experiments that BT has been undertaking with quantum cryptography. While I don’t think this is going to be an issue in the next year or even the next decade, assuming I stay healthy I expect it to be an issue in my lifetime. It is good to know that people are working on solving it.

At this point in time I would like to offer one minor criticism. Both of the presenters thus far were obviously using a slide deck created for a purpose other than UKNOF. I don’t have a huge problem with that, but it did bother me a little. As a speaker I always consider the opportunity to speak to be a privilege. While I joke about writing the slides on the way to the conference, I do put a lot of time into my presentations, and even if I am using some material from other decks I make sure to customize it for that particular conference. Ultimately what is important is the content and not the deck itself and perhaps UKNOF is a little more casual than other such meetings, but it still struck me as, well, rude, to skim through a whole bunch of slides to fit the time slot and the audience.

UKNOF41 - Julian Palmer

After a break the next presentation was from Julian Palmer of Corero (video|slides). Corero is a DDOS protection and mitigation company, which I assume means they compete with companies such as Cloudflare. I am always fascinated by the actions of those trying to break into networks and those trying to defend them, so I really enjoyed this presentation. It was interesting to see how much larger the DDOS attacks have grown over time and even more surprising to see how network providers can deal with them.

UKNOF41 - Stuart Clark

This was followed by Stuart Clark from Cisco Devnet giving a talk on using “DevOps” technologies with respect to network configurations (video|slides). This is a theme I’ve seen at a number of NOG conferences: let’s leverage configuration management tools designed for servers and apply them to networking gear. It makes sense, and it is interesting to note that the underlying technologies between both have become so similar that using these tools actually works. I can remember a time when accessing network gear required proprietary software running on Solaris or HP-UX. Now with Linux (and Linux-like) operating systems underpinning almost everything, it has become easier to migrate, say, Ansible to work on routers as well as servers.

It was my turn after Mr. Clark spoke. My presentation covered some of the new stuff we have released in OpenNMS, specifically things like the Minion and Drift, as well as a few of the newer things on which we are actively working (video|slides). I’m not sure how well it was received, but number of people came up to me afterward and say they enjoyed it. During the question and answer session Mr. McRae did state something that bothered me. He said, basically, that the goal of network monitoring should be to get rid of people. I keep hearing that, especially from large companies, but I have to disagree. Technology is moving too fast to ever get rid of people. In just half a day I was introduced to technologies such as EVPN and quantum key distribution, not to mention dealing with the ever-morphing realm of DDOS attacks, and there is just no way monitoring software will ever evolve fast enough to cover everything new just to get rid of people.

Instead, we should be focusing on enabling those people in monitoring to be able to do a great job. Eliminate the drudgery and give them the tools they need to deal with the constant changes in the networking space. I think it is a reasonable goal to use tools to reduce the need to hire more and more people for monitoring, but getting rid of them altogether does not seems likely, nor should we focus on it.

I was the last presentation before lunch (so I finished on time, ‘natch).

UKNOF41 - Chris Russell

The second half of the conference began with a presentation by Chris Russell (video|slides). The title was “Deploying an Atlas Probe (the Hard Way)”, which is kind of funny. RIPE NCC is the Internet Registry for Europe, and they have a program for deploying hardware probes to measure network performance. What’s funny is that you just plug them in. Done. While this presentation did include discussion of deploying an Atlas probe, it was more about splitting out a network and converting it to IPv6. IPv6 is the future (it is supported by OpenNMS) but in my experience organizations are very slowly migrating from IPv4 (the word “glacier” comes to mind). Sometimes it takes a strong use case to justify the trouble and this presentation was an excellent case study for why to do it and the pitfalls.

UKNOF41 - Andrew Ingram

Speaking of splitting out networks, the next presentation dealt with a similar situation. Presented by Andrew Ingram from High Tide Consulting, his session dealt with a company that acquired another company, then almost immediately spun it back out (video|slides). He was brought in to deal with the challenges of dealing with a partially combined network that needed to be separated in a very short amount of time with minimal downtime.

I sat next to Mr. Ingram for most of the conference and learned this was his first time presenting. I thought he did a great job. He sent me a note after the conference that he has “managed to get OpenNMS up and running in Azure with an NSG (Network Security Gateway) running in front for security and a Minion running on site. It all seams to be working very nicely”


UKNOF41 - Sara Dickinson

The following presentation would have to be my favorite of the day. Given by Sara Dickinson of Sinodun IT, it talked about ways to secure DNS traffic (video|slides).

The Internet wouldn’t work without DNS. It translates domain names into addresses, yet in most cases that traffic is sent in the clear. It’s metadata that can be an issue with respect to privacy. Do you think Google runs two of the most popular DNS servers out of the goodness of their heart? Nope, they can use that data to track what people are doing on the network. What’s worse is that every network provider on the path between you and your DNS server can see what you are doing. It is also an attack vector as well as a tool for censorship. DNS traffic can be “spoofed” to send users to the wrong server, and it can be blocked to prevent users from accessing specific sites.

To solve this, one answer is to encrypt that traffic, and Ms. Dickinson talked about a couple of options: DoT (DNS over TLS) and DoH (DNS over HTTPS).

The first one seems like such a no-brainer that I’m surprised it took me so long to deploy it. DoT encrypts the traffic between you and your DNS server. Now, you still have to trust your DNS provider, but this prevents passive surveillance of DNS traffic. I use a pfSense router at home and decided to set up DoT to the Quad9 servers. It was pretty simple. Of all of the major free DNS providers, Quad9 seems to have the strongest privacy policy.

The second protocol, DoH, is DNS straight from the browser. Instead of using a specific port, it can use an existing HTTPS connection. You can’t block it because if you do you’ll block all HTTPS traffic, and you can’t see the traffic separately from normal browsing. You still have to deal with privacy issues since that domain name has to be resolved somewhere and they will get header information, such as User-Agent, from the query, so there are tradeoffs.

While I learned a lot at UKNOF this has been the only thing I’ve actually implemented.

After a break we entered into the all too common “regulatory” section of the conference. Governments are adding more and more restrictions and requirements for network operators and these NOG meetings are often a good forum for talking about them.

UKNOF41 - Jonathan Langley

Jonathan Langley from the Information Commissioner’s Office (ICO) gave a talk on the Network and Information Systems Directive (NIS) (video|slides). NIS includes a number of requirements including things such as incident reporting. I thought it was interesting that NIS is an EU directive and the UK is leaving the EU, although it was stressed that NIS will apply post-Brexit. While there were a lot of regulations and procedures, it wasn’t as onerous as, say, TICSA in New Zealand.

UKNOF41 - Huw Saunders

This was followed by another regulatory presentation by Huw Saunders from The Office of Communications (Ofcom) (video|slides). This was fairly short and dealt primarily with Ofcom’s role in NIS.

UKNOF41 - Askar Sheibani

Askar Sheibani presented an introduction to the UK Fibre Connectivity Forum (video|slides). This is a trade organization that wants to deploy fiber connectivity to every commercial and residential building in the country. My understanding is that it will help facilitate such deployments among the various stakeholders.

UKNOF41 - David Johnston

The next to the last presentation struck a cord with me. Given by David Johnston, it talked about the progress the community of Balquhidder in rural Scotland is making in deploying its own Internet infrastructure (video|slides). I live in rural North Carolina, USA, and even though the golf course community one mile from my house has 300 Mbps service from Spectrum, I’m stuck with an unreliable DSL connection from CenturyLink, which, when it works, is a little over 10 Mbps. Laws in North Carolina currently make it illegal for a municipality to provide broadband service to its citizens, but should that law get overturned I’ve thought about trying to spearhead some sort of grassroots service here. It was interesting to learn how they are doing it in rural Scotland.

UKNOF41 - Charlie Boisseau

The final presentation was funny. Given by Charlie Boisseau, it was about “Layer 0” or “The Dirty Layer” (video|slides). It covered how cable and fiber are deployed in the UK. The access chambers for conduit have covers that state the names of the organizations that own them, and with mergers, acquisitions and bankruptcies those change (but the covers do not). While I was completely lost, the rest of the crowd had fun guessing the progression of one company to another. Anyone in the UK can deploy their own network infrastructure, but it isn’t exactly cheap, and the requirements were covered in the talk.

After the conference they served beer and snacks, and then I headed back to the hotel to get ready for my early morning flight home.

I had a lot of fun at UKNOF and look forward to returning some day. If you are a network provider in the UK it is worth it to attend. They hold two meetings a year, with one always being in London, so there is a good chance one will come near you at some point in time.

by Tarus at October 08, 2018 02:51 PM

October 04, 2018

Grafana Performance Dashboards

As Administrator it is sometimes necessary to diagnose performance characteristics between different servers. There are two diagnostic dashboards which can be used to compare performance metrics from SNMP agents running on Microsoft Windows and Linux. The performance metrics are collected with the out-of-the box configuration on a OpenNMS Horizon and OpenNMS Meridian and are published on the Grafana dashboard repository.

You can just install them by importing the dashboards with an...

October 04, 2018 06:32 PM

September 24, 2018

Meridian 2018

It is hard to believe that our first release of OpenNMS Meridian was over three years ago.

Meridian Logo

We were struggling with trying to balance the needs of a support organization with the open source desire to “release early, release often”. How do you deal with wanting to be as cutting edge as possible but to support customers who really need a stable platform? We did have a “development” release, but no one really used it.

Our answer was to model OpenNMS on Red Hat, the most successful open source company in existence. While Red Hat has hundreds of products, their main offering is Red Hat Enterprise Linux (RHEL). This is derived, in large part, from the Fedora Linux distribution. New things hit Fedora first and, once vetted, make their way into RHEL.

We decided to do the same thing with OpenNMS. OpenNMS was split into two main branches: Horizon and Meridian. Horizon was the Fedora equivalent, while Meridian was modeled on RHEL.

This has been very successful. While we were averaging a new major OpenNMS release every 18 months, now we do three or four Horizon releases per year. Tons of new features are hitting Horizon, from the ability to deal with telemetry data, new correlation features to condense alarms into “situations” based on unsupervised machine learning, to the first steps toward a microservices architecture.

We do our best to release code as production-ready as possible. Our users are very creative and use OpenNMS in unique ways. By offering up rapid Horizon releases it allows us to find and fix issues quickly and work out how to best implement new functionality.

But what about our users who are more interested in stability than the “new shiny”? They needed a system that was rock solid and easy to maintain. That’s why we created Meridian. Meridian lags Horizon on features but by the time a feature hits Meridian, it has been tested thoroughly and can immediately be deployed into production.

There is one major Meridian release a year, with usually three or four point updates. Anyone who has ever upgraded OpenNMS understands that dealing with configuration file changes can be problematic. With Meridian, moving from one point release to another rarely changes configuration, so upgrades can happen in minutes and users can rest assured that their systems are up to date and secure. Each Meridian release is supported for three years.

There is a cost associated with using Meridian. Similar to RHEL, it is offered as a subscription. While still 100% open source, you pay a fee to access the update servers, and the idea is that you are paying for the effort it takes to refine Horizon into Meridian and get the most stable version of OpenNMS possible. We are so convinced that Meridian is worth it, it is available without having to buy a support contract. Meridian users get access to OpenNMS Connect, which is a forum for asking questions about using Meridian.

It seems like it was just yesterday that we did this but it has now been over three years. That means support will sunset on Meridian 2015 at the end of the year. Never fear, the latest releases are just as stable and even more feature rich.

The main feature in Meridian 2018 is support for the OpenNMS Minion. The Minion is a stateless application that allows for remote distribution of OpenNMS functionality. For example, I used to run an OpenNMS instance at my house to monitor my devices. Now I just have a Minion. Even though my network is not reachable from our production OpenNMS instance, the Minion allows me to test service availability, and well as collect data and traps, and then forward them on to the main application. The Minion itself is stateless – it connects to a messaging broker on the OpenNMS server in order to get its list of tasks.

A Minion is defined by its “Location”. You can have multiple Minions for a given location and they will access the broker via a “competitive consumer queue”. This way if a particular Minion goes down, there can be another to do the work. By default OpenNMS ships with ActiveMQ as the broker, but it is also possible to use an external Kafka instance as well. Kafka can be clustered for both load balancing and reliability, and the combination of a Kafka cluster and multiple Minions can make the amount of devices OpenNMS monitors virtually limitless (we are working on a proof of concept for one user with over 8 million discrete devices).

There are a number of other features in Meridian 2018, so check out the release notes for more details. It is an exciting addition to the OpenNMS product line.

by Tarus at September 24, 2018 06:01 PM

August 05, 2018

Smart Debug Logging

Sometimes you have to set a specific OpenNMS deamon to DEBUG mode to find issues in your configuration. Depending of the environments size OpenNMS can create many log entries while being in DEBUG mode. But in some cases the log rotation is faster than an editor can open a log file and the amount of logs is increasing heavily. Especially collectd can be extremely chatty and it seems to be impossible to grep the needed parts out of it.

But the solution is simple. Since OpenNMS uses Log4j2 it's p...

August 05, 2018 06:32 PM

July 09, 2018

Usage of Operator Instructions in Alarms

Monitoring services or metrics and getting alarms isn't complicated. A more interesting question is: How to fix them (fast)?

Monitoring systems, even in small or middle sized environments, creates a lot of different alarms. When you are working in a team, sometimes the person who creates a test or configures a threshold that throws an alarm is not the same person who has to understand what happened and what to do next.

Either way, you should have some kind of documentation for when you need...

July 09, 2018 05:42 PM

How to build Docker images from branches

In OpenNMS we use Atlassian Bamboo which runs all our tests and build also the packages which can be installed on different operating systems. It plays an important role as quality gate for changes going into the code base. Our Bamboo is public available and can be seen by everyone.

There are two type of branches, one following a pattern like "features/sentinel" and another like "jira/HZN-1307". The feature branches are work in progress branches and used to collect many smaller changes to...

July 09, 2018 10:00 AM

July 03, 2018

iNOG and Ripe NCC Hackathon

Our UK OpenNMS ambassador, Craig Gallen, gave us a hint about a meeting from the Irish Network Operators Group (iNOG) followed by a 2 day Network Operators Tools Hackathon co-hosted by Ripe NCC. I've attended a few NOG meetings already and like the tech-driven and very friendly atmosphere. Luckily, The OpenNMS Group sponsored the trip and so I was able to get myself first time to Dublin.

The iNOG meeting was hosted by Workday in their office in Dublin. We started at 6:00 PM with some...

July 03, 2018 03:33 PM

May 30, 2018

Running an OpenNMS Minion with Docker

Running a Minion with Docker is relatively easy, you just need to have Docker installed. It makes also updating the Minion very easy cause you can follow the tags latest for latest stable version or trying a bleeding snapshot. You can configure everything through environment variables. At a bare minimum you need something like this:

docker run --rm -d \
  -e "TZ=Europe/Berlin" \
  -e "MINION_LOCATION=Apex-Office" \
  -e "OPENNMS_BROKER_URL=tcp://opennms-ip:61616" \

May 30, 2018 12:23 PM

May 21, 2018

OpenNMS.js v1.2.2

Just a small release to fix a bug encountered in OpenNMS Helm when the new Drift ReST API returned no exporters.

Bug Fixes

  • dao: HELM-91: fix handling non-array results (0707ac7)

by RangerRick at May 21, 2018 08:37 PM

April 13, 2018

OpenNMS.js v1.2.1

Bug Fixes

  • rest: fix HTTP timeout configuration to be more consistent (c8e7162)

by RangerRick at April 13, 2018 05:36 PM

April 09, 2018

OUCE 2018

We are happy to announce our new OpenNMS User Conference in 2018 in Munich. The OpenNMS User Conference Europe is a series of conferences focused on all the things around monitoring and network management. Our conferences create a time and place for the community to share information, discuss ideas and work together to improve monitoring with the free software OpenNMS. The two-day event gives you expert presentations and technical workshops around OpenNMS starting on Thursday, 20th September until Friday, 21st September 2018.

April 09, 2018 05:00 PM

March 30, 2018

OpenNMS.js v1.2.0

Bug Fixes

  • http: add timeout to GrafanaHTTP (22bdd70)


  • rest: NMS-9783: add X-Requested-With header to requests (e803726)

by RangerRick at March 30, 2018 02:35 AM

March 16, 2018

OpenNMS.js v1.1.1

1.1.0 had a build dependency that snuck into the runtime dependencies and caused a number of unnecessary projects to get pulled in. This release is identical to 1.1.0 with that dependency fixed.

by RangerRick at March 16, 2018 09:45 PM

OpenNMS.js v1.1.0

This release adds support for telemetry (Netflow) APIs which will be introduced in OpenNMS 22.x. It also includes a number of build optimizations and updates since v1.0.3.

by RangerRick at March 16, 2018 09:44 PM

February 13, 2018

OpenNMS.js v1.0.3

This is a small release with documentation updates, an API for stringifying model objects, and enabling search properties in the NodeDAO.

by RangerRick at February 13, 2018 03:37 PM

January 09, 2018

Running OpenNMS Horizon in Docker

Running applications in containers provides many benefits, and it's not just hype. Higher velocity to maintain changes while keeping a service available. Scaling your software by just spinning up more instances to handle load. Container images allow you to manage your application dependencies and link them all to a portable and standardised runnable container image. The infrastructure can be used as a commodity and container allow you to manage resource usage on a granularity at process level. A...

January 09, 2018 09:48 PM