
这个网络运营的博客是我回到基础系列中的第四个。前三个博客覆盖Gathering Requirements,Network Design和网络实现。
一旦ACME公司的数据中心网络已经实施,它将需要注意和喂养,由操作和优化组成。成功的网络运营有一些关键成分。
Documentation
ACME’s data center project produced several documents that should be stored in two locations. First, all project documents should be stored in a read-only project archive. It will be used to answer questions like “Why did you do it this way?” or “What were you thinking?” This archive may prove useful to future projects as they try to follow established standards. Second, a copy of all documents deemed useful to network operations (i.e., cable plans, diagrams, configuration templates, IP addressing plans, and naming and numbering conventions) should be turned over to the network operations team.
ACME的一些资源宁愿从公司的一个巨型弹弓推出而不是更新文件,但这对于整个团队的成功至关重要,以将这些人视为生活文件。在所有相应的文书工作都完成之前,应考虑无需进行网络更改。
Unless there is some security policy preventing it, sharing read-only versions of these documents with other operations and support teams goes a long way to helping them understand what they are using. Other teams are far more likely to blame the network for all their problems when all they can visualize is a black box. Demystify the network whenever and wherever you can.
网络可见性
网络可见性有多种形式。对于网络的成功,一些对网络的成功至关重要,可以将许多网络操作转移到积极主动的反应。以下部分描述了应为ACME追求的网络可见性形式。一般类别的网络可见性工具是根本原因分析,容量规划和安全性。
Syslog收藏家:应该实现可以实现索引,存储和允许查询来自所有网络设备的所有Syslog消息的Syslog收集器。为了最有效,此Syslog收集器除网络设备外,此Syslog收集器还应收集来自服务器,存储系统,数据库,应用程序和安全设备的日志。查询此类Syslog消息的集合将在故障排除和执行根本原因分析时重新关联事件。
配置管理:每个网络设备都需要配置。为了在设备更换和失败期间快速轻松地恢复,应将所有配置存储并存档,并使用已发生的更改历史存档。在审计期间,这种类型的工具非常宝贵。
Network Performance Monitor (NPM):网络性能监视器是一类可以帮助猜测容量规划的猜测的工具类别。这些工具可以报告链接利用率,延迟和往返时间,并在交叉阈值时生成警报。来自这些工具的数据可以用作安全设备的大小的输入,并帮助创建和维护准确的服务质量(QoS)策略。可以从现实世界流量和合成流量中收集网络统计信息,通常以IP流信息导出(IPFIX)的形式收集。
Application Performance Monitor (APM):Application performance monitors take performance information a few steps deeper and can provide information about applications that an NPM tool cannot. An APM can collect information about transaction times and report on what the overall user experience is.
Health Monitor:A useful health monitor reports on information such as up/down status, CPU utilization, link/bandwidth utilization, memory utilization, power consumption and temperature to name a few. This product should be evaluated to make sure it meets the needs of monitoring a highly available data center network.
Packet Capture:Packet captures are one of the main tools used to perform a deep dive into network and application performance problems. There are many packet capture tools available for use, some are open source, and some are sold for a fee.
网络可见性面料(NVF):将NVF实施到ACME的数据中心网络中将为网络可见性提供优异的基础。通过安装在网络的所有图层上的一系列测试接入点(TAP),可以将所有数据包的副本发送,带外,到中央聚合点或数据包代理可以由所提到的工具使用更多。由于NVF向带外发送数据包副本,因此对生产流量或生产网络没有任何性能影响。由于流量继续通过网络畅通无阻,因此复制流量以进行监视,安全性和分析。此外,由于所有数据包都有一个中央收集点,因此代理数据包更简单到需要它们的工具。组织通常需要较少的每种类型的工具(即,监视,安全性,分析),因为初始数据包集合由经纪人而不是特定工具执行,并且没有所需的重新设计来实现新工具。关键是数据包的中心集合。
An NVF can place tools in-line with production traffic which allows tools such as security devices (i.e., intrusion prevention system (IPS), anti-malware, data loss prevention (DLP)) to be implemented without a network re-design.
Using virtual TAPs, an NVF can TAP into virtualized environments where it has been historically challenging to troubleshoot.
IPFIX generation can be very resource intensive on network devices and sending IPFIX statistics to a collector consumes bandwidth. Offloading IPFIX generation to the NVF removes both problems.
Figure 1 displays what an NVF would look like in ACME’s data center network. The green lines show where TAPs have been installed in the traffic path and are cabled to a central collector or packet broker. The orange lines show how traffic from the packet broker is sent to the appropriate tools.
图1:网络可见性面料
Change is the Only Constant
As business and application requirements evolve, the network will also need to evolve. Security vulnerabilities will no doubt be exposed that need to be remediated to avoid exploitation. Network operating system lifecycles and bug fixes will force upgrades. Capacity may need to be added. Whatever the reason, there will be change.
All significant changes should send the network through a spate of tests to ensure it still supports all business requirements. Periodic testing would make sure even minor changes are tested in batches. In a perfect world, every aspect of the data center network would be duplicated in a lab environment so that all changes could first be tested there. Even if ACME had the funds to do this, they can’t duplicate the types of traffic along with the load and flow that is experienced in production. Invariably, there is always some level of risk associated with every change, so ACME has only provided a subset of the network in a lab environment.
文档不应免于改变。是团队参与者和文件,文件,文件。
Read My Other Blogs in the Series