1. 网络监控概述

网络监控是现代IT基础设施运维的核心组成部分,通过实时监控网络流量、连接状态、性能指标和故障情况,可以确保网络服务的稳定性和高效性。本文将详细介绍网络监控的核心指标、监控工具、性能分析方法以及最佳实践。

1.1 网络监控的重要性

  1. 服务保障: 确保网络服务的可用性和稳定性
  2. 性能优化: 识别网络瓶颈,优化网络配置
  3. 安全防护: 检测异常流量和潜在安全威胁
  4. 容量规划: 为网络扩容提供数据支持
  5. 故障诊断: 快速定位和解决网络问题

1.2 核心监控指标

  • 带宽使用率: 网络接口的流量使用情况
  • 连接状态: TCP/UDP连接的数量和状态
  • 延迟时间: 网络延迟和响应时间
  • 丢包率: 网络数据包丢失情况
  • 错误率: 网络错误和重传情况
  • 吞吐量: 网络数据传输速率

1.3 监控层次

  1. 物理层: 网络接口状态、链路状态
  2. 数据链路层: MAC地址、VLAN信息
  3. 网络层: IP地址、路由信息
  4. 传输层: TCP/UDP连接状态
  5. 应用层: 应用协议性能

2. 网络监控工具详解

2.1 netstat命令详解

netstat是基础的网络连接查看工具,提供网络连接和统计信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 查看所有网络连接
netstat -an

# 查看TCP连接
netstat -ant

# 查看UDP连接
netstat -anu

# 查看监听端口
netstat -tlnp

# 查看路由表
netstat -rn

# 查看网络接口统计
netstat -i

# 查看网络统计信息
netstat -s

# 按状态统计连接数
netstat -an | awk '/^tcp/ {print $6}' | sort | uniq -c

# 查看特定端口的连接
netstat -an | grep :80

# 查看特定IP的连接
netstat -an | grep 192.168.1.100

2.2 ss命令详解

ss是netstat的现代替代品,提供更快的网络连接信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 查看所有连接
ss -a

# 查看TCP连接
ss -t

# 查看UDP连接
ss -u

# 查看监听端口
ss -l

# 查看进程信息
ss -p

# 查看连接状态统计
ss -s

# 查看特定端口
ss -tlnp | grep :80

# 查看特定状态
ss -t state established

# 查看网络接口
ss -i

# 查看连接详细信息
ss -tulpn

2.3 iftop流量监控

iftop提供实时网络流量监控。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 安装iftop
yum install iftop # CentOS/RHEL
apt install iftop # Ubuntu/Debian

# 启动iftop
iftop

# 指定网络接口
iftop -i eth0

# 显示端口信息
iftop -P

# 显示主机名
iftop -n

# 显示端口号
iftop -p

# 过滤特定主机
iftop -f "host 192.168.1.100"

# 批处理模式
iftop -t -s 10

2.4 nload流量监控

nload提供网络接口流量监控。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 安装nload
yum install nload # CentOS/RHEL
apt install nload # Ubuntu/Debian

# 启动nload
nload

# 指定网络接口
nload eth0

# 设置刷新间隔
nload -t 2000

# 显示所有接口
nload -a

# 设置时间窗口
nload -t 1000 -i 10000

2.5 网络监控脚本

创建自定义的网络监控脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#!/bin/bash
# network_monitor.sh - 网络监控脚本

# 配置参数
LOG_FILE="/var/log/network_monitor.log"
ALERT_BANDWIDTH_THRESHOLD=80
ALERT_CONNECTION_THRESHOLD=1000
CHECK_INTERVAL=60

# 日志函数
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}

# 检查网络接口状态
check_interface_status() {
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
status=$(cat /sys/class/net/$interface/operstate)
if [ "$status" != "up" ]; then
log_message "WARNING: Interface $interface is $status"
fi
fi
done
}

# 检查带宽使用率
check_bandwidth_usage() {
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
# 获取接口统计信息
rx_bytes=$(cat /sys/class/net/$interface/statistics/rx_bytes)
tx_bytes=$(cat /sys/class/net/$interface/statistics/tx_bytes)

# 计算带宽使用率(简化计算)
total_bytes=$((rx_bytes + tx_bytes))
if [ $total_bytes -gt 0 ]; then
log_message "Interface $interface: RX=${rx_bytes}bytes, TX=${tx_bytes}bytes"
fi
fi
done
}

# 检查连接数
check_connection_count() {
tcp_connections=$(ss -t | wc -l)
udp_connections=$(ss -u | wc -l)
total_connections=$((tcp_connections + udp_connections))

if [ $total_connections -gt $ALERT_CONNECTION_THRESHOLD ]; then
log_message "WARNING: High connection count: $total_connections"
fi

log_message "Connection count: TCP=$tcp_connections, UDP=$udp_connections, Total=$total_connections"
}

# 检查网络错误
check_network_errors() {
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_errors=$(cat /sys/class/net/$interface/statistics/rx_errors)
tx_errors=$(cat /sys/class/net/$interface/statistics/tx_errors)
rx_dropped=$(cat /sys/class/net/$interface/statistics/rx_dropped)
tx_dropped=$(cat /sys/class/net/$interface/statistics/tx_dropped)

if [ $rx_errors -gt 0 ] || [ $tx_errors -gt 0 ] || [ $rx_dropped -gt 0 ] || [ $tx_dropped -gt 0 ]; then
log_message "WARNING: Interface $interface errors - RX_ERR:$rx_errors, TX_ERR:$tx_errors, RX_DROP:$rx_dropped, TX_DROP:$tx_dropped"
fi
fi
done
}

# 检查网络延迟
check_network_latency() {
# 检查到网关的延迟
gateway=$(ip route | grep default | awk '{print $3}' | head -1)
if [ -n "$gateway" ]; then
latency=$(ping -c 3 -W 1 $gateway 2>/dev/null | grep "avg" | awk -F'/' '{print $5}')
if [ -n "$latency" ]; then
if (( $(echo "$latency > 100" | bc -l) )); then
log_message "WARNING: High latency to gateway: ${latency}ms"
else
log_message "Latency to gateway: ${latency}ms"
fi
fi
fi
}

# 主监控循环
main() {
log_message "Network monitor started"

while true; do
check_interface_status
check_bandwidth_usage
check_connection_count
check_network_errors
check_network_latency

sleep $CHECK_INTERVAL
done
}

# 启动监控
main

3. 网络性能分析

3.1 带宽分析

3.1.1 带宽使用率分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# 实时带宽监控
#!/bin/bash
# bandwidth_monitor.sh
INTERFACE="eth0"
INTERVAL=1

while true; do
# 获取当前时间戳
current_time=$(date +%s)

# 读取接口统计信息
rx_bytes=$(cat /sys/class/net/$INTERFACE/statistics/rx_bytes)
tx_bytes=$(cat /sys/class/net/$INTERFACE/statistics/tx_bytes)

if [ -n "$prev_time" ]; then
# 计算时间差
time_diff=$((current_time - prev_time))

# 计算字节差
rx_diff=$((rx_bytes - prev_rx_bytes))
tx_diff=$((tx_bytes - prev_tx_bytes))

# 计算速率 (bytes per second)
rx_rate=$((rx_diff / time_diff))
tx_rate=$((tx_diff / time_diff))

# 转换为更友好的单位
rx_mbps=$(echo "scale=2; $rx_rate * 8 / 1024 / 1024" | bc)
tx_mbps=$(echo "scale=2; $tx_rate * 8 / 1024 / 1024" | bc)

echo "$(date '+%H:%M:%S') RX: ${rx_mbps}Mbps TX: ${tx_mbps}Mbps"
fi

# 保存当前值
prev_time=$current_time
prev_rx_bytes=$rx_bytes
prev_tx_bytes=$tx_bytes

sleep $INTERVAL
done

3.1.2 带宽趋势分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash
# bandwidth_trend.sh
INTERFACE="eth0"
LOG_FILE="/var/log/bandwidth_trend.log"
INTERVAL=60

log_bandwidth() {
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local rx_bytes=$(cat /sys/class/net/$INTERFACE/statistics/rx_bytes)
local tx_bytes=$(cat /sys/class/net/$INTERFACE/statistics/tx_bytes)

echo "$timestamp,$rx_bytes,$tx_bytes" >> $LOG_FILE
}

# 生成带宽趋势报告
generate_trend_report() {
echo "=== Bandwidth Trend Report ==="
echo "Time,RX_Bytes,TX_Bytes,RX_MBps,TX_MBps"

tail -n 100 $LOG_FILE | while IFS=',' read timestamp rx_bytes tx_bytes; do
if [ -n "$prev_rx_bytes" ]; then
rx_diff=$((rx_bytes - prev_rx_bytes))
tx_diff=$((tx_bytes - prev_tx_bytes))

rx_mbps=$(echo "scale=2; $rx_diff / 1024 / 1024" | bc)
tx_mbps=$(echo "scale=2; $tx_diff / 1024 / 1024" | bc)

echo "$timestamp,$rx_bytes,$tx_bytes,$rx_mbps,$tx_mbps"
fi

prev_rx_bytes=$rx_bytes
prev_tx_bytes=$tx_bytes
done
}

# 主循环
while true; do
log_bandwidth
sleep $INTERVAL
done

3.2 连接分析

3.2.1 连接状态分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/bin/bash
# connection_analyzer.sh
echo "=== Connection Analysis ==="

# 按状态统计连接
echo "Connection Status Distribution:"
ss -s

echo -e "\nTCP Connection States:"
ss -t -a | awk 'NR>1 {print $1}' | sort | uniq -c | sort -nr

echo -e "\nTop 10 Local Ports:"
ss -tln | awk 'NR>1 {print $4}' | cut -d: -f2 | sort | uniq -c | sort -nr | head -10

echo -e "\nTop 10 Remote IPs:"
ss -tn | awk 'NR>1 {print $4}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

echo -e "\nEstablished Connections by Port:"
ss -t state established | awk 'NR>1 {print $4}' | cut -d: -f2 | sort | uniq -c | sort -nr

echo -e "\nTime Wait Connections:"
ss -t state time-wait | wc -l

echo -e "\nClose Wait Connections:"
ss -t state close-wait | wc -l

3.2.2 连接性能分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# connection_performance.sh
echo "=== Connection Performance Analysis ==="

# 检查连接建立时间
echo "Connection Establishment Time:"
time nc -z google.com 80

# 检查连接保持时间
echo -e "\nConnection Keep-Alive Analysis:"
ss -t -o | grep keepalive | wc -l

# 检查重传情况
echo -e "\nRetransmission Analysis:"
cat /proc/net/snmp | grep Tcp | awk '{print "RetransSegs:", $13}'

# 检查窗口大小
echo -e "\nWindow Size Analysis:"
ss -t -i | grep -o "wscale:[0-9]*" | sort | uniq -c

# 检查RTT
echo -e "\nRTT Analysis:"
ss -t -i | grep -o "rtt:[0-9.]*" | sort | uniq -c

3.3 延迟分析

3.3.1 网络延迟监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash
# latency_monitor.sh
TARGETS=("8.8.8.8" "1.1.1.1" "google.com" "baidu.com")
INTERVAL=10

while true; do
echo "=== Latency Check $(date) ==="

for target in "${TARGETS[@]}"; do
echo -n "Pinging $target: "

# 执行ping测试
ping_result=$(ping -c 3 -W 1 $target 2>/dev/null)

if [ $? -eq 0 ]; then
# 提取延迟信息
avg_latency=$(echo "$ping_result" | grep "avg" | awk -F'/' '{print $5}')
min_latency=$(echo "$ping_result" | grep "avg" | awk -F'/' '{print $4}')
max_latency=$(echo "$ping_result" | grep "avg" | awk -F'/' '{print $6}')

echo "Min: ${min_latency}ms, Avg: ${avg_latency}ms, Max: ${max_latency}ms"

# 检查延迟是否过高
if (( $(echo "$avg_latency > 100" | bc -l) )); then
echo "WARNING: High latency detected for $target"
fi
else
echo "FAILED"
fi
done

echo ""
sleep $INTERVAL
done

3.3.2 延迟趋势分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/bin/bash
# latency_trend.sh
TARGET="8.8.8.8"
LOG_FILE="/var/log/latency_trend.log"
INTERVAL=60

while true; do
timestamp=$(date '+%Y-%m-%d %H:%M:%S')

# 执行ping测试
ping_result=$(ping -c 5 -W 1 $TARGET 2>/dev/null)

if [ $? -eq 0 ]; then
avg_latency=$(echo "$ping_result" | grep "avg" | awk -F'/' '{print $5}')
packet_loss=$(echo "$ping_result" | grep "packet loss" | awk '{print $6}' | tr -d '%')

echo "$timestamp,$avg_latency,$packet_loss" >> $LOG_FILE
echo "$timestamp: Latency=${avg_latency}ms, Loss=${packet_loss}%"
else
echo "$timestamp: Ping failed" >> $LOG_FILE
echo "$timestamp: Ping failed"
fi

sleep $INTERVAL
done

4. 网络问题诊断

4.1 连接问题诊断

4.1.1 连接超时诊断

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/bin/bash
# connection_timeout_diagnosis.sh
TARGET_HOST="$1"
TARGET_PORT="$2"

if [ -z "$TARGET_HOST" ] || [ -z "$TARGET_PORT" ]; then
echo "Usage: $0 <host> <port>"
exit 1
fi

echo "=== Connection Timeout Diagnosis ==="
echo "Target: $TARGET_HOST:$TARGET_PORT"

# 检查DNS解析
echo -e "\n1. DNS Resolution:"
nslookup $TARGET_HOST

# 检查路由
echo -e "\n2. Route Trace:"
traceroute $TARGET_HOST

# 检查端口连通性
echo -e "\n3. Port Connectivity:"
timeout 10 nc -zv $TARGET_HOST $TARGET_PORT

# 检查防火墙
echo -e "\n4. Firewall Check:"
iptables -L -n | grep $TARGET_PORT

# 检查本地端口占用
echo -e "\n5. Local Port Check:"
ss -tlnp | grep :$TARGET_PORT

# 检查系统资源
echo -e "\n6. System Resources:"
echo "Memory usage:"
free -h
echo "CPU usage:"
top -bn1 | grep "Cpu(s)"
echo "File descriptors:"
lsof | wc -l

4.1.2 连接重置诊断

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
# connection_reset_diagnosis.sh
echo "=== Connection Reset Diagnosis ==="

# 检查连接重置统计
echo "Connection Reset Statistics:"
cat /proc/net/snmp | grep Tcp | awk '{print "RstSegs:", $14}'

# 检查网络错误
echo -e "\nNetwork Interface Errors:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_errors=$(cat /sys/class/net/$interface/statistics/rx_errors)
tx_errors=$(cat /sys/class/net/$interface/statistics/tx_errors)
echo "$interface: RX_ERR=$rx_errors, TX_ERR=$tx_errors"
fi
done

# 检查TCP参数
echo -e "\nTCP Parameters:"
echo "tcp_keepalive_time: $(cat /proc/sys/net/ipv4/tcp_keepalive_time)"
echo "tcp_keepalive_intvl: $(cat /proc/sys/net/ipv4/tcp_keepalive_intvl)"
echo "tcp_keepalive_probes: $(cat /proc/sys/net/ipv4/tcp_keepalive_probes)"

# 检查连接状态
echo -e "\nConnection States:"
ss -s

# 检查TIME_WAIT连接
echo -e "\nTIME_WAIT Connections:"
ss -t state time-wait | wc -l

4.2 性能问题诊断

4.2.1 带宽瓶颈诊断

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
# bandwidth_bottleneck_diagnosis.sh
echo "=== Bandwidth Bottleneck Diagnosis ==="

# 检查网络接口状态
echo "Network Interface Status:"
ip link show

# 检查接口统计
echo -e "\nInterface Statistics:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
echo "Interface: $interface"
cat /sys/class/net/$interface/statistics/rx_bytes
cat /sys/class/net/$interface/statistics/tx_bytes
cat /sys/class/net/$interface/statistics/rx_packets
cat /sys/class/net/$interface/statistics/tx_packets
echo ""
fi
done

# 检查网络队列
echo "Network Queue Status:"
tc qdisc show

# 检查网络缓冲区
echo -e "\nNetwork Buffer Status:"
cat /proc/sys/net/core/rmem_max
cat /proc/sys/net/core/wmem_max
cat /proc/sys/net/core/rmem_default
cat /proc/sys/net/core/wmem_default

# 检查TCP窗口大小
echo -e "\nTCP Window Size:"
cat /proc/sys/net/ipv4/tcp_window_scaling
cat /proc/sys/net/ipv4/tcp_rmem
cat /proc/sys/net/ipv4/tcp_wmem

4.2.2 延迟问题诊断

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/bin/bash
# latency_diagnosis.sh
TARGET="$1"
if [ -z "$TARGET" ]; then
TARGET="8.8.8.8"
fi

echo "=== Latency Diagnosis for $TARGET ==="

# 基础ping测试
echo "1. Basic Ping Test:"
ping -c 10 $TARGET

# 详细ping测试
echo -e "\n2. Detailed Ping Test:"
ping -c 20 -i 0.2 $TARGET

# 路由跟踪
echo -e "\n3. Route Trace:"
traceroute $TARGET

# 检查MTU
echo -e "\n4. MTU Check:"
ping -M do -s 1472 $TARGET
ping -M do -s 1500 $TARGET

# 检查网络接口MTU
echo -e "\n5. Interface MTU:"
ip link show | grep mtu

# 检查TCP参数
echo -e "\n6. TCP Parameters:"
echo "tcp_congestion_control: $(cat /proc/sys/net/ipv4/tcp_congestion_control)"
echo "tcp_no_delay_ack: $(cat /proc/sys/net/ipv4/tcp_no_delay_ack)"
echo "tcp_low_latency: $(cat /proc/sys/net/ipv4/tcp_low_latency)"

4.3 安全威胁诊断

4.3.1 异常流量检测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/bash
# anomaly_traffic_detection.sh
echo "=== Anomaly Traffic Detection ==="

# 检查异常连接数
echo "1. Connection Count Analysis:"
total_connections=$(ss -t | wc -l)
echo "Total TCP connections: $total_connections"

if [ $total_connections -gt 1000 ]; then
echo "WARNING: High connection count detected"
fi

# 检查异常端口
echo -e "\n2. Port Analysis:"
ss -tln | awk 'NR>1 {print $4}' | cut -d: -f2 | sort | uniq -c | sort -nr | head -10

# 检查异常IP
echo -e "\n3. IP Analysis:"
ss -tn | awk 'NR>1 {print $4}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

# 检查SYN flood
echo -e "\n4. SYN Flood Check:"
syn_count=$(ss -t state syn-sent | wc -l)
echo "SYN connections: $syn_count"

if [ $syn_count -gt 100 ]; then
echo "WARNING: Potential SYN flood detected"
fi

# 检查连接状态异常
echo -e "\n5. Connection State Analysis:"
ss -t -a | awk 'NR>1 {print $1}' | sort | uniq -c | sort -nr

4.3.2 网络攻击检测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
# network_attack_detection.sh
echo "=== Network Attack Detection ==="

# 检查端口扫描
echo "1. Port Scan Detection:"
ss -tln | awk 'NR>1 {print $4}' | cut -d: -f2 | sort | uniq -c | sort -nr | head -20

# 检查异常连接模式
echo -e "\n2. Connection Pattern Analysis:"
ss -tn | awk 'NR>1 {print $4}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -20

# 检查DDoS攻击
echo -e "\n3. DDoS Detection:"
# 检查大量连接来自同一IP
ss -tn | awk 'NR>1 {print $4}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -5

# 检查网络错误
echo -e "\n4. Network Error Analysis:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_errors=$(cat /sys/class/net/$interface/statistics/rx_errors)
tx_errors=$(cat /sys/class/net/$interface/statistics/tx_errors)
rx_dropped=$(cat /sys/class/net/$interface/statistics/rx_dropped)
tx_dropped=$(cat /sys/class/net/$interface/statistics/tx_dropped)

if [ $rx_errors -gt 0 ] || [ $tx_errors -gt 0 ] || [ $rx_dropped -gt 0 ] || [ $tx_dropped -gt 0 ]; then
echo "$interface: ERR_RX=$rx_errors, ERR_TX=$tx_errors, DROP_RX=$rx_dropped, DROP_TX=$tx_dropped"
fi
fi
done

5. 网络优化策略

5.1 带宽优化

5.1.1 带宽分配优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
# bandwidth_optimization.sh
echo "=== Bandwidth Optimization ==="

# 检查当前带宽配置
echo "1. Current Bandwidth Configuration:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
speed=$(cat /sys/class/net/$interface/speed 2>/dev/null || echo "Unknown")
duplex=$(cat /sys/class/net/$interface/duplex 2>/dev/null || echo "Unknown")
echo "$interface: Speed=$speed, Duplex=$duplex"
fi
done

# 优化网络缓冲区
echo -e "\n2. Network Buffer Optimization:"
echo "Current buffer settings:"
cat /proc/sys/net/core/rmem_max
cat /proc/sys/net/core/wmem_max

# 设置更大的缓冲区
echo "Setting larger buffers..."
echo 16777216 > /proc/sys/net/core/rmem_max
echo 16777216 > /proc/sys/net/core/wmem_max
echo 16777216 > /proc/sys/net/core/rmem_default
echo 16777216 > /proc/sys/net/core/wmem_default

# 优化TCP参数
echo -e "\n3. TCP Parameter Optimization:"
echo "Current TCP settings:"
cat /proc/sys/net/ipv4/tcp_rmem
cat /proc/sys/net/ipv4/tcp_wmem

# 设置TCP窗口大小
echo "4096 87380 16777216" > /proc/sys/net/ipv4/tcp_rmem
echo "4096 65536 16777216" > /proc/sys/net/ipv4/tcp_wmem

# 启用TCP窗口缩放
echo 1 > /proc/sys/net/ipv4/tcp_window_scaling

# 启用TCP时间戳
echo 1 > /proc/sys/net/ipv4/tcp_timestamps

# 启用TCP选择性确认
echo 1 > /proc/sys/net/ipv4/tcp_sack

5.1.2 流量控制优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#!/bin/bash
# traffic_control_optimization.sh
echo "=== Traffic Control Optimization ==="

# 检查当前队列规则
echo "1. Current Queue Rules:"
tc qdisc show

# 设置HTB队列规则
echo -e "\n2. Setting HTB Queue Rules:"
INTERFACE="eth0"

# 删除现有规则
tc qdisc del dev $INTERFACE root 2>/dev/null

# 添加HTB根队列
tc qdisc add dev $INTERFACE root handle 1: htb default 30

# 添加根类别
tc class add dev $INTERFACE parent 1: classid 1:1 htb rate 1000mbit

# 添加子类别
tc class add dev $INTERFACE parent 1:1 classid 1:10 htb rate 800mbit ceil 1000mbit
tc class add dev $INTERFACE parent 1:1 classid 1:20 htb rate 150mbit ceil 200mbit
tc class add dev $INTERFACE parent 1:1 classid 1:30 htb rate 50mbit ceil 100mbit

# 添加过滤器
tc filter add dev $INTERFACE parent 1: protocol ip prio 1 u32 match ip dport 80 0xffff flowid 1:10
tc filter add dev $INTERFACE parent 1: protocol ip prio 2 u32 match ip dport 443 0xffff flowid 1:10
tc filter add dev $INTERFACE parent 1: protocol ip prio 3 u32 match ip dport 22 0xffff flowid 1:20

echo "Traffic control rules configured successfully"

5.2 延迟优化

5.2.1 网络延迟优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/bin/bash
# latency_optimization.sh
echo "=== Latency Optimization ==="

# 优化TCP参数
echo "1. TCP Parameter Optimization:"
echo "Current TCP settings:"
cat /proc/sys/net/ipv4/tcp_no_delay_ack
cat /proc/sys/net/ipv4/tcp_low_latency

# 启用TCP_NODELAY
echo 1 > /proc/sys/net/ipv4/tcp_no_delay_ack

# 启用低延迟模式
echo 1 > /proc/sys/net/ipv4/tcp_low_latency

# 优化TCP拥塞控制
echo "2. TCP Congestion Control Optimization:"
echo "Current congestion control:"
cat /proc/sys/net/ipv4/tcp_congestion_control

# 设置BBR拥塞控制算法
echo bbr > /proc/sys/net/ipv4/tcp_congestion_control

# 优化网络接口
echo -e "\n3. Network Interface Optimization:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
# 启用GRO
echo 1 > /sys/class/net/$interface/gro_flush_timeout

# 启用TSO
ethtool -K $interface tso on 2>/dev/null

# 启用GSO
ethtool -K $interface gso on 2>/dev/null

echo "Optimized interface: $interface"
fi
done

# 优化中断处理
echo -e "\n4. Interrupt Handling Optimization:"
echo "Current interrupt settings:"
cat /proc/interrupts | grep eth0

# 设置中断亲和性
echo 2 > /proc/irq/24/smp_affinity # 假设eth0的中断号是24

5.2.2 应用层延迟优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash
# application_latency_optimization.sh
echo "=== Application Latency Optimization ==="

# 优化DNS解析
echo "1. DNS Resolution Optimization:"
echo "Current DNS settings:"
cat /etc/resolv.conf

# 添加本地DNS缓存
echo "nameserver 127.0.0.1" > /etc/resolv.conf.local
echo "nameserver 8.8.8.8" >> /etc/resolv.conf.local
echo "nameserver 1.1.1.1" >> /etc/resolv.conf.local

# 优化连接池
echo -e "\n2. Connection Pool Optimization:"
echo "Current connection limits:"
cat /proc/sys/net/core/somaxconn
cat /proc/sys/net/ipv4/tcp_max_syn_backlog

# 增加连接队列大小
echo 65535 > /proc/sys/net/core/somaxconn
echo 65535 > /proc/sys/net/ipv4/tcp_max_syn_backlog

# 优化TCP连接复用
echo -e "\n3. TCP Connection Reuse Optimization:"
echo "Current TCP reuse settings:"
cat /proc/sys/net/ipv4/tcp_tw_reuse
cat /proc/sys/net/ipv4/tcp_tw_recycle

# 启用TIME_WAIT重用
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

# 优化TCP快速打开
echo -e "\n4. TCP Fast Open Optimization:"
echo "Current TCP fast open settings:"
cat /proc/sys/net/ipv4/tcp_fastopen

# 启用TCP快速打开
echo 3 > /proc/sys/net/ipv4/tcp_fastopen

5.3 安全优化

5.3.1 网络安全加固

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#!/bin/bash
# network_security_hardening.sh
echo "=== Network Security Hardening ==="

# 配置防火墙规则
echo "1. Firewall Configuration:"
# 清空现有规则
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X

# 设置默认策略
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# 允许回环接口
iptables -A INPUT -i lo -j ACCEPT

# 允许已建立的连接
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# 允许SSH连接
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# 允许HTTP和HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# 限制连接速率
iptables -A INPUT -p tcp --dport 80 -m limit --limit 25/minute --limit-burst 100 -j ACCEPT

# 防止SYN flood攻击
iptables -A INPUT -p tcp --syn -m limit --limit 1/s --limit-burst 3 -j ACCEPT

echo "Firewall rules configured"

# 配置TCP参数
echo -e "\n2. TCP Parameter Hardening:"
# 启用SYN cookies
echo 1 > /proc/sys/net/ipv4/tcp_syncookies

# 减少SYN重试次数
echo 1 > /proc/sys/net/ipv4/tcp_syn_retries

# 减少FIN超时时间
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

# 限制TIME_WAIT连接数
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

# 配置IP转发
echo -e "\n3. IP Forwarding Configuration:"
echo "Current IP forwarding:"
cat /proc/sys/net/ipv4/ip_forward

# 禁用IP转发(如果不是路由器)
echo 0 > /proc/sys/net/ipv4/ip_forward

# 禁用ICMP重定向
echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects
echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects

# 禁用源路由
echo 0 > /proc/sys/net/ipv4/conf/all/accept_source_route

5.3.2 网络监控安全

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#!/bin/bash
# network_monitoring_security.sh
echo "=== Network Monitoring Security ==="

# 配置网络监控日志
echo "1. Network Monitoring Log Configuration:"
LOG_FILE="/var/log/network_security.log"

# 监控异常连接
monitor_connections() {
while true; do
timestamp=$(date '+%Y-%m-%d %H:%M:%S')

# 检查异常连接数
connection_count=$(ss -t | wc -l)
if [ $connection_count -gt 1000 ]; then
echo "$timestamp - WARNING: High connection count: $connection_count" >> $LOG_FILE
fi

# 检查异常端口
ss -tln | awk 'NR>1 {print $4}' | cut -d: -f2 | sort | uniq -c | sort -nr | head -5 | while read count port; do
if [ $count -gt 100 ]; then
echo "$timestamp - WARNING: High connection count on port $port: $count" >> $LOG_FILE
fi
done

sleep 60
done
}

# 监控网络错误
monitor_errors() {
while true; do
timestamp=$(date '+%Y-%m-%d %H:%M:%S')

for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_errors=$(cat /sys/class/net/$interface/statistics/rx_errors)
tx_errors=$(cat /sys/class/net/$interface/statistics/tx_errors)

if [ $rx_errors -gt 0 ] || [ $tx_errors -gt 0 ]; then
echo "$timestamp - WARNING: Interface $interface errors - RX:$rx_errors, TX:$tx_errors" >> $LOG_FILE
fi
fi
done

sleep 60
done
}

# 启动监控
monitor_connections &
monitor_errors &

echo "Network security monitoring started"
echo "Log file: $LOG_FILE"

6. 网络监控最佳实践

6.1 监控策略

6.1.1 分层监控策略

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#!/bin/bash
# layered_network_monitoring.sh
echo "=== Layered Network Monitoring Strategy ==="

# 物理层监控
monitor_physical_layer() {
echo "1. Physical Layer Monitoring:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
status=$(cat /sys/class/net/$interface/operstate)
speed=$(cat /sys/class/net/$interface/speed 2>/dev/null || echo "Unknown")
duplex=$(cat /sys/class/net/$interface/duplex 2>/dev/null || echo "Unknown")

echo "Interface: $interface, Status: $status, Speed: $speed, Duplex: $duplex"

if [ "$status" != "up" ]; then
echo "WARNING: Interface $interface is $status"
fi
fi
done
}

# 数据链路层监控
monitor_data_link_layer() {
echo -e "\n2. Data Link Layer Monitoring:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
mac_address=$(cat /sys/class/net/$interface/address)
mtu=$(cat /sys/class/net/$interface/mtu)

echo "Interface: $interface, MAC: $mac_address, MTU: $mtu"
fi
done
}

# 网络层监控
monitor_network_layer() {
echo -e "\n3. Network Layer Monitoring:"
echo "Routing table:"
ip route show

echo -e "\nIP addresses:"
ip addr show

echo -e "\nARP table:"
ip neigh show
}

# 传输层监控
monitor_transport_layer() {
echo -e "\n4. Transport Layer Monitoring:"
echo "TCP connections:"
ss -t -s

echo -e "\nUDP connections:"
ss -u -s

echo -e "\nConnection states:"
ss -t -a | awk 'NR>1 {print $1}' | sort | uniq -c | sort -nr
}

# 应用层监控
monitor_application_layer() {
echo -e "\n5. Application Layer Monitoring:"
echo "Listening ports:"
ss -tlnp | head -20

echo -e "\nTop connections by port:"
ss -tln | awk 'NR>1 {print $4}' | cut -d: -f2 | sort | uniq -c | sort -nr | head -10
}

# 执行所有监控
monitor_physical_layer
monitor_data_link_layer
monitor_network_layer
monitor_transport_layer
monitor_application_layer

6.1.2 关键指标监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#!/bin/bash
# key_metrics_monitoring.sh
echo "=== Key Network Metrics Monitoring ==="

# 带宽使用率监控
monitor_bandwidth() {
echo "1. Bandwidth Usage Monitoring:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_bytes=$(cat /sys/class/net/$interface/statistics/rx_bytes)
tx_bytes=$(cat /sys/class/net/$interface/statistics/tx_bytes)
rx_packets=$(cat /sys/class/net/$interface/statistics/rx_packets)
tx_packets=$(cat /sys/class/net/$interface/statistics/tx_packets)

echo "Interface: $interface"
echo " RX: $rx_bytes bytes, $rx_packets packets"
echo " TX: $tx_bytes bytes, $tx_packets packets"
fi
done
}

# 连接数监控
monitor_connections() {
echo -e "\n2. Connection Count Monitoring:"
tcp_connections=$(ss -t | wc -l)
udp_connections=$(ss -u | wc -l)
established_connections=$(ss -t state established | wc -l)
listening_ports=$(ss -tln | wc -l)

echo "TCP connections: $tcp_connections"
echo "UDP connections: $udp_connections"
echo "Established connections: $established_connections"
echo "Listening ports: $listening_ports"
}

# 延迟监控
monitor_latency() {
echo -e "\n3. Latency Monitoring:"
targets=("8.8.8.8" "1.1.1.1")

for target in "${targets[@]}"; do
echo -n "Pinging $target: "
ping_result=$(ping -c 3 -W 1 $target 2>/dev/null)

if [ $? -eq 0 ]; then
avg_latency=$(echo "$ping_result" | grep "avg" | awk -F'/' '{print $5}')
echo "${avg_latency}ms"
else
echo "FAILED"
fi
done
}

# 错误率监控
monitor_errors() {
echo -e "\n4. Error Rate Monitoring:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_errors=$(cat /sys/class/net/$interface/statistics/rx_errors)
tx_errors=$(cat /sys/class/net/$interface/statistics/tx_errors)
rx_dropped=$(cat /sys/class/net/$interface/statistics/rx_dropped)
tx_dropped=$(cat /sys/class/net/$interface/statistics/tx_dropped)

if [ $rx_errors -gt 0 ] || [ $tx_errors -gt 0 ] || [ $rx_dropped -gt 0 ] || [ $tx_dropped -gt 0 ]; then
echo "Interface $interface:"
echo " RX Errors: $rx_errors, TX Errors: $tx_errors"
echo " RX Dropped: $rx_dropped, TX Dropped: $tx_dropped"
fi
fi
done
}

# 执行所有监控
monitor_bandwidth
monitor_connections
monitor_latency
monitor_errors

6.2 告警机制

6.2.1 网络告警脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
#!/bin/bash
# network_alert.sh
ALERT_EMAIL="admin@company.com"
ALERT_WEBHOOK="https://hooks.slack.com/services/xxx"
LOG_FILE="/var/log/network_alerts.log"

# 日志函数
log_alert() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}

# 邮件告警
send_email_alert() {
local subject="$1"
local message="$2"
echo "$message" | mail -s "$subject" $ALERT_EMAIL
log_alert "Email alert sent: $subject"
}

# Slack告警
send_slack_alert() {
local message="$1"
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$message\"}" \
$ALERT_WEBHOOK
log_alert "Slack alert sent: $message"
}

# 检查网络接口状态
check_interface_status() {
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
status=$(cat /sys/class/net/$interface/operstate)
if [ "$status" != "up" ]; then
local alert_msg="CRITICAL: Interface $interface is $status"
send_email_alert "Network Interface Down" "$alert_msg"
send_slack_alert "$alert_msg"
fi
fi
done
}

# 检查带宽使用率
check_bandwidth_usage() {
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
# 获取接口速度
speed=$(cat /sys/class/net/$interface/speed 2>/dev/null)
if [ -n "$speed" ] && [ "$speed" != "Unknown" ]; then
# 计算使用率(简化计算)
rx_bytes=$(cat /sys/class/net/$interface/statistics/rx_bytes)
tx_bytes=$(cat /sys/class/net/$interface/statistics/tx_bytes)

# 这里需要更复杂的计算来获取实际使用率
# 简化示例
if [ $rx_bytes -gt 1000000000 ]; then
local alert_msg="WARNING: High bandwidth usage on interface $interface"
send_email_alert "High Bandwidth Usage" "$alert_msg"
send_slack_alert "$alert_msg"
fi
fi
fi
done
}

# 检查连接数
check_connection_count() {
total_connections=$(ss -t | wc -l)
if [ $total_connections -gt 1000 ]; then
local alert_msg="WARNING: High connection count: $total_connections"
send_email_alert "High Connection Count" "$alert_msg"
send_slack_alert "$alert_msg"
fi
}

# 检查网络延迟
check_network_latency() {
gateway=$(ip route | grep default | awk '{print $3}' | head -1)
if [ -n "$gateway" ]; then
latency=$(ping -c 3 -W 1 $gateway 2>/dev/null | grep "avg" | awk -F'/' '{print $5}')
if [ -n "$latency" ]; then
if (( $(echo "$latency > 100" | bc -l) )); then
local alert_msg="WARNING: High latency to gateway: ${latency}ms"
send_email_alert "High Network Latency" "$alert_msg"
send_slack_alert "$alert_msg"
fi
fi
fi
}

# 检查网络错误
check_network_errors() {
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_errors=$(cat /sys/class/net/$interface/statistics/rx_errors)
tx_errors=$(cat /sys/class/net/$interface/statistics/tx_errors)

if [ $rx_errors -gt 0 ] || [ $tx_errors -gt 0 ]; then
local alert_msg="WARNING: Network errors on interface $interface - RX:$rx_errors, TX:$tx_errors"
send_email_alert "Network Errors" "$alert_msg"
send_slack_alert "$alert_msg"
fi
fi
done
}

# 主检查逻辑
main() {
log_alert "Network alert check started"

check_interface_status
check_bandwidth_usage
check_connection_count
check_network_latency
check_network_errors

log_alert "Network alert check completed"
}

main

6.3 自动化运维

6.3.1 网络自动恢复

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
#!/bin/bash
# network_auto_recovery.sh
LOG_FILE="/var/log/network_recovery.log"
MAX_RETRIES=3
RETRY_INTERVAL=60

log_recovery() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}

# 重启网络接口
restart_interface() {
local interface="$1"
local retry_count=0

while [ $retry_count -lt $MAX_RETRIES ]; do
log_recovery "Attempting to restart interface $interface (attempt $((retry_count + 1)))"

# 关闭接口
ip link set $interface down
sleep 5

# 启动接口
ip link set $interface up
sleep 10

# 检查接口状态
status=$(cat /sys/class/net/$interface/operstate)
if [ "$status" = "up" ]; then
log_recovery "Interface $interface restarted successfully"
return 0
fi

retry_count=$((retry_count + 1))
sleep $RETRY_INTERVAL
done

log_recovery "Failed to restart interface $interface after $MAX_RETRIES attempts"
return 1
}

# 重启网络服务
restart_network_service() {
log_recovery "Restarting network service"

systemctl restart network
sleep 30

# 检查网络服务状态
if systemctl is-active --quiet network; then
log_recovery "Network service restarted successfully"
return 0
else
log_recovery "Failed to restart network service"
return 1
fi
}

# 检查网络连通性
check_connectivity() {
local target="$1"
if [ -z "$target" ]; then
target="8.8.8.8"
fi

ping -c 3 -W 1 $target > /dev/null 2>&1
return $?
}

# 主恢复逻辑
main() {
log_recovery "Network auto-recovery started"

# 检查网络连通性
if ! check_connectivity; then
log_recovery "Network connectivity check failed"

# 检查接口状态
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
status=$(cat /sys/class/net/$interface/operstate)
if [ "$status" != "up" ]; then
log_recovery "Interface $interface is $status, attempting restart"
restart_interface "$interface"
fi
fi
done

# 再次检查连通性
if ! check_connectivity; then
log_recovery "Still no connectivity, restarting network service"
restart_network_service
fi
else
log_recovery "Network connectivity is normal"
fi

log_recovery "Network auto-recovery completed"
}

main

7. 网络监控工具集成

7.1 Prometheus集成

7.1.1 Node Exporter网络指标

1
2
3
4
5
6
7
8
9
10
# prometheus.yml
global:
scrape_interval: 15s

scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
scrape_interval: 5s
metrics_path: /metrics

7.1.2 自定义网络监控指标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#!/bin/bash
# network_metrics_exporter.sh
METRICS_FILE="/tmp/network_metrics.prom"
METRICS_PORT=8080

generate_metrics() {
cat > $METRICS_FILE << EOF
# HELP network_interface_up Interface up status
# TYPE network_interface_up gauge
EOF

for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
status=$(cat /sys/class/net/$interface/operstate)
up_value=0
if [ "$status" = "up" ]; then
up_value=1
fi
echo "network_interface_up{interface=\"$interface\"} $up_value" >> $METRICS_FILE
fi
done

cat >> $METRICS_FILE << EOF

# HELP network_interface_rx_bytes_total Total bytes received
# TYPE network_interface_rx_bytes_total counter
EOF

for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_bytes=$(cat /sys/class/net/$interface/statistics/rx_bytes)
echo "network_interface_rx_bytes_total{interface=\"$interface\"} $rx_bytes" >> $METRICS_FILE
fi
done

cat >> $METRICS_FILE << EOF

# HELP network_interface_tx_bytes_total Total bytes transmitted
# TYPE network_interface_tx_bytes_total counter
EOF

for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
tx_bytes=$(cat /sys/class/net/$interface/statistics/tx_bytes)
echo "network_interface_tx_bytes_total{interface=\"$interface\"} $tx_bytes" >> $METRICS_FILE
fi
done

cat >> $METRICS_FILE << EOF

# HELP network_tcp_connections_total Total TCP connections
# TYPE network_tcp_connections_total gauge
EOF

tcp_connections=$(ss -t | wc -l)
echo "network_tcp_connections_total $tcp_connections" >> $METRICS_FILE

cat >> $METRICS_FILE << EOF

# HELP network_udp_connections_total Total UDP connections
# TYPE network_udp_connections_total gauge
EOF

udp_connections=$(ss -u | wc -l)
echo "network_udp_connections_total $udp_connections" >> $METRICS_FILE
}

# 启动HTTP服务器
start_metrics_server() {
while true; do
generate_metrics
sleep 10
done &

python3 -c "
import http.server
import socketserver
import os

class MetricsHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
if self.path == '/metrics':
self.send_response(200)
self.send_header('Content-type', 'text/plain')
self.end_headers()
with open('$METRICS_FILE', 'r') as f:
self.wfile.write(f.read().encode())
else:
self.send_response(404)
self.end_headers()

with socketserver.TCPServer(('', $METRICS_PORT), MetricsHandler) as httpd:
httpd.serve_forever()
"
}

start_metrics_server

7.2 Grafana仪表板

7.2.1 网络监控仪表板配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
{
"dashboard": {
"title": "Network Monitoring Dashboard",
"panels": [
{
"title": "Network Interface Status",
"type": "stat",
"targets": [
{
"expr": "network_interface_up",
"legendFormat": "{{interface}}"
}
]
},
{
"title": "Network Traffic",
"type": "graph",
"targets": [
{
"expr": "rate(network_interface_rx_bytes_total[5m])",
"legendFormat": "{{interface}} RX"
},
{
"expr": "rate(network_interface_tx_bytes_total[5m])",
"legendFormat": "{{interface}} TX"
}
]
},
{
"title": "TCP Connections",
"type": "graph",
"targets": [
{
"expr": "network_tcp_connections_total",
"legendFormat": "TCP Connections"
}
]
},
{
"title": "UDP Connections",
"type": "graph",
"targets": [
{
"expr": "network_udp_connections_total",
"legendFormat": "UDP Connections"
}
]
}
]
}
}

8. 实战案例

8.1 Web服务器网络监控

8.1.1 Nginx网络监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/bin/bash
# nginx_network_monitor.sh
echo "=== Nginx Network Monitoring ==="

# 检查Nginx进程
nginx_pids=$(pgrep nginx)
if [ -z "$nginx_pids" ]; then
echo "ERROR: Nginx is not running!"
exit 1
fi

echo "Nginx PIDs: $nginx_pids"

# 检查监听端口
echo -e "\nListening Ports:"
ss -tlnp | grep nginx

# 检查连接统计
echo -e "\nConnection Statistics:"
echo "HTTP connections (port 80):"
ss -t | grep :80 | wc -l

echo "HTTPS connections (port 443):"
ss -t | grep :443 | wc -l

# 检查连接状态
echo -e "\nConnection States:"
ss -t | awk 'NR>1 {print $1}' | sort | uniq -c | sort -nr

# 检查网络错误
echo -e "\nNetwork Errors:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_errors=$(cat /sys/class/net/$interface/statistics/rx_errors)
tx_errors=$(cat /sys/class/net/$interface/statistics/tx_errors)

if [ $rx_errors -gt 0 ] || [ $tx_errors -gt 0 ]; then
echo "Interface $interface: RX_ERR=$rx_errors, TX_ERR=$tx_errors"
fi
fi
done

# 检查带宽使用
echo -e "\nBandwidth Usage:"
for interface in $(ip link show | grep -E "^[0-9]+:" | cut -d: -f2 | tr -d ' '); do
if [ "$interface" != "lo" ]; then
rx_bytes=$(cat /sys/class/net/$interface/statistics/rx_bytes)
tx_bytes=$(cat /sys/class/net/$interface/statistics/tx_bytes)

rx_mbps=$(echo "scale=2; $rx_bytes * 8 / 1024 / 1024" | bc)
tx_mbps=$(echo "scale=2; $tx_bytes * 8 / 1024 / 1024" | bc)

echo "Interface $interface: RX=${rx_mbps}Mbps, TX=${tx_mbps}Mbps"
fi
done

8.1.2 Apache网络监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/bin/bash
# apache_network_monitor.sh
echo "=== Apache Network Monitoring ==="

# 检查Apache进程
apache_pids=$(pgrep httpd)
if [ -z "$apache_pids" ]; then
echo "ERROR: Apache is not running!"
exit 1
fi

echo "Apache PIDs: $apache_pids"

# 检查监听端口
echo -e "\nListening Ports:"
ss -tlnp | grep httpd

# 检查连接统计
echo -e "\nConnection Statistics:"
echo "HTTP connections (port 80):"
ss -t | grep :80 | wc -l

echo "HTTPS connections (port 443):"
ss -t | grep :443 | wc -l

# 检查连接状态
echo -e "\nConnection States:"
ss -t | awk 'NR>1 {print $1}' | sort | uniq -c | sort -nr

# 检查模块状态
echo -e "\nModule Status:"
httpd -M 2>/dev/null | head -20

# 检查性能统计
echo -e "\nPerformance Statistics:"
if [ -f /var/log/httpd/access_log ]; then
echo "Recent requests:"
tail -10 /var/log/httpd/access_log
fi

8.2 数据库网络监控

8.2.1 MySQL网络监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash
# mysql_network_monitor.sh
echo "=== MySQL Network Monitoring ==="

# 检查MySQL进程
mysql_pids=$(pgrep mysqld)
if [ -z "$mysql_pids" ]; then
echo "ERROR: MySQL is not running!"
exit 1
fi

echo "MySQL PIDs: $mysql_pids"

# 检查监听端口
echo -e "\nListening Ports:"
ss -tlnp | grep mysql

# 检查连接统计
echo -e "\nConnection Statistics:"
mysql -e "SHOW STATUS LIKE 'Connections';" 2>/dev/null
mysql -e "SHOW STATUS LIKE 'Threads_connected';" 2>/dev/null
mysql -e "SHOW STATUS LIKE 'Threads_running';" 2>/dev/null

# 检查网络连接
echo -e "\nNetwork Connections:"
ss -t | grep :3306 | wc -l

# 检查连接状态
echo -e "\nConnection States:"
ss -t | grep :3306 | awk '{print $1}' | sort | uniq -c | sort -nr

# 检查网络错误
echo -e "\nNetwork Errors:"
mysql -e "SHOW STATUS LIKE 'Aborted_connects';" 2>/dev/null
mysql -e "SHOW STATUS LIKE 'Aborted_clients';" 2>/dev/null

# 检查网络统计
echo -e "\nNetwork Statistics:"
mysql -e "SHOW STATUS LIKE 'Bytes_received';" 2>/dev/null
mysql -e "SHOW STATUS LIKE 'Bytes_sent';" 2>/dev/null

8.2.2 Redis网络监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/bash
# redis_network_monitor.sh
echo "=== Redis Network Monitoring ==="

# 检查Redis进程
redis_pids=$(pgrep redis-server)
if [ -z "$redis_pids" ]; then
echo "ERROR: Redis is not running!"
exit 1
fi

echo "Redis PIDs: $redis_pids"

# 检查监听端口
echo -e "\nListening Ports:"
ss -tlnp | grep redis

# 检查连接统计
echo -e "\nConnection Statistics:"
redis-cli info clients 2>/dev/null

# 检查网络连接
echo -e "\nNetwork Connections:"
ss -t | grep :6379 | wc -l

# 检查连接状态
echo -e "\nConnection States:"
ss -t | grep :6379 | awk '{print $1}' | sort | uniq -c | sort -nr

# 检查网络统计
echo -e "\nNetwork Statistics:"
redis-cli info stats 2>/dev/null | grep -E "(total_connections_received|total_commands_processed|instantaneous_ops_per_sec)"

# 检查内存使用
echo -e "\nMemory Usage:"
redis-cli info memory 2>/dev/null | grep -E "(used_memory|used_memory_peak|used_memory_rss)"

8.3 负载均衡器网络监控

8.3.1 HAProxy网络监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/bash
# haproxy_network_monitor.sh
echo "=== HAProxy Network Monitoring ==="

# 检查HAProxy进程
haproxy_pids=$(pgrep haproxy)
if [ -z "$haproxy_pids" ]; then
echo "ERROR: HAProxy is not running!"
exit 1
fi

echo "HAProxy PIDs: $haproxy_pids"

# 检查监听端口
echo -e "\nListening Ports:"
ss -tlnp | grep haproxy

# 检查连接统计
echo -e "\nConnection Statistics:"
echo "HAProxy connections:"
ss -t | grep haproxy | wc -l

# 检查后端服务器状态
echo -e "\nBackend Server Status:"
if [ -f /var/run/haproxy/admin.sock ]; then
echo "show stat" | socat stdio /var/run/haproxy/admin.sock | head -20
fi

# 检查负载均衡统计
echo -e "\nLoad Balancing Statistics:"
if [ -f /var/run/haproxy/admin.sock ]; then
echo "show stat" | socat stdio /var/run/haproxy/admin.sock | grep -E "(srv|backend)"
fi

9. 总结

网络监控是现代IT基础设施运维的核心技能,通过合理的监控策略和工具使用,可以:

  1. 保障服务可用性: 及时发现网络故障,确保服务稳定运行
  2. 优化网络性能: 识别网络瓶颈,优化网络配置
  3. 提升安全防护: 检测异常流量和潜在安全威胁
  4. 支持容量规划: 为网络扩容提供数据支持
  5. 快速故障诊断: 快速定位和解决网络问题

9.1 关键要点

  • 全面监控: 覆盖物理层到应用层的全方位监控
  • 实时告警: 建立完善的告警机制,及时发现问题
  • 自动化运维: 实现网络自动恢复和健康检查
  • 性能分析: 使用专业工具进行深度性能分析
  • 持续优化: 根据监控数据持续优化网络配置

9.2 最佳实践

  1. 分层监控: 从物理层到应用层的全方位监控
  2. 阈值设置: 合理设置监控阈值,避免误报
  3. 工具集成: 集成多种监控工具,提供统一视图
  4. 文档记录: 详细记录监控配置和故障处理流程
  5. 定期演练: 定期进行故障演练,验证监控有效性

通过本文的学习和实践,您将掌握企业级网络监控的核心技能,能够有效监控和管理生产环境中的各种网络设备和应用,确保网络服务稳定高效运行。