The Network Automation Forum (NAF), a community of engineers interested in advancing the state of network automation, held its first event AutoCon 0 in Denver at the end of last year. Panels and talks tackled the question: Why haven’t we seen full adoption of network automation, yet? And how can we advance network automation best practices across all types of organizations?
On the mini track “How Can We Be Better?”, speakers from Zayo, Major League Baseball, PacketFabric, and others discussed the challenges enterprises and technology vendors face when trying to achieve a fully automated network. The following is a panel transcript, edited for length and clarity.
Watch the full video below:
Moderator
Anna Claiborne, SVP Packet and Product Software Engineering, Zayo
Panelists
Graham Vaughan, Senior Director, Network Architecture, PacketFabric
Aaron Werley, SVP of Engineering, Zayo
Jeremy Schulman, Senior Director of Solution Engineering, Infrastructure, Major League Baseball
Richard Boucher, Product Manager at NetBox Labs
Kirk Byers, Network Automation, Twin Bridges Technology
Levi Perigo, Professor of Network Engineering, CU Boulder
Claiborne: What is the current top-order problem in the network automation space that is blocking or limiting progress?
Schulman: I think buying good quality products is our first move. We want to buy good quality products to solve our problems. We don’t want to build all the things. It’s not the value proposition. I think that would be so helpful to have really good quality multi-vendor products that help the common network engineer or the traditional network engineer who wants to learn a little bit of code to sprinkle around the edges of a really good product that has a really good API.
Byers: [T]he biggest problem that many people need to overcome is to start thinking of automation as a system, and I think this is the root of a lot of things we’re failing to do. So why are we failing to do CI/CD [continuous integration/continuous delivery]? Why do we have our work siloed only on a contributor’s laptop? Why do our projects die when one person leaves? …[W]e’re failing to think of automation as an ongoing system of making your work public with other people [who] can work on it, automatically building in tests, which is what CI/CD, making Git commits, enables for us.
Vaughan: I think one of the top-order problems we’re missing is a unified configuration and telemetry abstraction layer…. At PacketFabric, we went from a greenfield network to a brownfield network, and we had to basically write this abstraction layer because it didn’t exist, not in a usable way anyway. Every time we onboard a new vendor, we have to add to that abstraction layer. We have to enhance our template engine so it knows what to do when it receives a standard payload. It would be much easier if we could just pass that payload directly to the device because it’s already agnostic. If I had that, I wouldn’t have to spend all this time writing templates for everything I want to do for every product I have for every vendor in the network.
Claiborne: Are we really ready for hands-off? And if so, how do we get there? And by hands-off, I mean the self-driving network…so that means unattended upgrades, ML-assisted [machine learning] troubleshooting and diagnostics, even long before ML was cool, and eventually reaching that state where you have the self-driving, self-healing network that will do everything for itself…So why is it so difficult to have a network single source of truth and does it matter? …[Or] should we instead learn to live in a fuzzy world where there are multiple potential sources of truth and learn to reconcile between those?
Boucher: [T]o be honest, I’m not a big fan of the single source of truth. I think that’s a pipe dream. The reality is that there will be multiple because there are multiple stakeholders, [and] there are multiple reasons to do that.
But fundamentally, I think one of the big challenges around single or network sources of truth is not all organizations are actually on board [with] the fact that they actually need one…. And so it becomes very much an organizational alignment challenge because there are multiple sources of truth in organizations, [whether they’re] spreadsheets, databases, whether [they’re] asset management systems, whether [they’re] IPAM [IP address management] tools and so on…and they have different stakeholders, different purposes….[A]ligning all those together and making sure all of those stakeholders’ needs are taken into account, and then figuring out what is the authoritative source of truth for any piece of data? That’s a complex organizational discussion that is sometimes painful…. I think it’s a very challenging proposition, other than [for] the organizations that are really committed to network automation.
Vaughan: Fully automated upgrades? Yes. Fully automated troubleshooting? I don’t think we’re quite there yet.
There are only so many things you can control. Anything outside your scope of control, that’s the limit of what your automation can do. For example, you can detect a failure in a router, you can automatically open a case with a vendor, but then you’re kind of left to wait because you’ve gone as far as you can.
Can you do automated network upgrades? That’s one of the easier things, and we’ve accomplished that. I think the harder part is putting some intelligence around scheduling. So we can take a device and say, “Upgrade this device at this time,” and it will send out all the notifications to all the customers that are on that device. It will actually kick off the upgrade, do pre- and post-checks, and send the operator a Slack message showing the differential between the two. And most of the time, that’s unattended, and it works fine. So the part that is harder is scheduling. It takes a network engineer sitting down to schedule 500 devices, and they don’t have any intelligence around that. Even if they have the tool to do it, it’s still a heavy task.
And as far as troubleshooting goes, I think it’s been touched on that machine learning in this space is not really machine learning, it’s more anomaly detection, so we do have an anomaly detection tool…. [A]n operator can kick off the anomaly detection tool, and it can go out to the right port, the right device for the right customer, but it’s only so useful because it can only detect so many anomalies. It can detect level shifting or bits per second (BPS) going to zero, things like that. And that’s somewhat useful, but it’s not complete. I think we have a long way to go there.
Claiborne: What are your predictions of the state of network automation in five years?
Schulman: I think we are going to get closer to the concept [of] self-driving networks. It may not be like hands-off, go to sleep, be drunk, and take me home, but I think we’re going to be much, much closer than we think we are today. I think we’re going to see advancements in natural language human processing to democratize information about the network so that we don’t have to ask the questions in our point of view, but the stakeholders can ask it in their point of view and get the answers they need to draw down burden from the experts in the networking industry that need to work on the real hard problems.
Perigo: I’m encouraged, in the last five years, how much we’ve progressed. I mean, I think we’ve come leaps and bounds…and part of my job responsibility is R&D, and I think it’s going to be AI and ML for network automation going forward.
Vaughan: True ML will actually be possible as the barrier to entry of training your own models with your own data is lowered. And we’re starting to see that now. You’ve got OpenAI, who will allow you to bring your own training data and train your models, and as that becomes more prevalent, then we’ll be able to actually use the data we have to train models and to get to that point where we have self-driving networks. Also, everything will be on the blockchain.
Werley: I’m going to say everything’s going to be like Skynet…. But no, I think you’re going to see in five years just a broader spectrum of automation. You’re going to have some organizations that are just so far behind, haven’t moved, some that are going to be [at] the forefront, and the gap is going to be pretty material in five years. And those that are [at] the forefront, I think the domain starts to expand. You start getting closer and closer towards the physical layer.