1. Nginx运维监控概述

Nginx作为高性能的Web服务器和反向代理服务器,在生产环境中需要专业的运维监控和管理。本文将详细介绍Nginx集群部署、负载均衡配置、SSL安全、性能优化、监控告警的完整解决方案,帮助运维人员有效管理Nginx服务。

1.1 核心挑战

  1. 负载均衡: 实现高效的负载均衡和流量分发
  2. 反向代理: 配置灵活的反向代理和缓存策略
  3. SSL安全: 配置HTTPS和SSL证书管理
  4. 性能优化: 优化Nginx性能和并发处理能力
  5. 监控告警: 实时监控Nginx状态和性能指标

1.2 技术架构

1
2
3
4
5
Nginx集群 → 负载均衡 → 反向代理 → 后端服务 → 数据库
↓ ↓ ↓ ↓ ↓
SSL配置 → 缓存策略 → 限流控制 → 健康检查 → 日志分析
↓ ↓ ↓ ↓ ↓
监控告警 → 性能优化 → 自动扩容 → 故障转移 → 运维记录

2. Nginx安装与基础配置

2.1 Nginx编译安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#!/bin/bash
# Nginx编译安装脚本
# @author 运维实战

# 下载Nginx源码
NGINX_VERSION="1.24.0"
cd /usr/local/src
wget http://nginx.org/download/nginx-${NGINX_VERSION}.tar.gz
tar -zxvf nginx-${NGINX_VERSION}.tar.gz
cd nginx-${NGINX_VERSION}

# 安装依赖
yum install -y gcc gcc-c++ pcre pcre-devel zlib zlib-devel openssl openssl-devel

# 编译配置
./configure \
--prefix=/usr/local/nginx \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_v2_module \
--with-http_realip_module \
--with-http_addition_module \
--with-http_sub_module \
--with-http_dav_module \
--with-http_flv_module \
--with-http_mp4_module \
--with-http_gunzip_module \
--with-http_gzip_static_module \
--with-http_random_index_module \
--with-http_secure_link_module \
--with-http_stub_status_module \
--with-http_auth_request_module \
--with-threads \
--with-stream \
--with-stream_ssl_module \
--with-http_slice_module \
--with-file-aio \
--with-http_v2_module

# 编译安装
make && make install

# 创建nginx用户
useradd -r -s /sbin/nologin nginx

# 创建systemd服务文件
cat > /etc/systemd/system/nginx.service << 'EOF'
[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/usr/local/nginx/logs/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target
EOF

# 启动Nginx
systemctl daemon-reload
systemctl enable nginx
systemctl start nginx

echo "Nginx安装完成"

2.2 Nginx主配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# nginx.conf - Nginx主配置文件
# @author 运维实战

# 运行用户
user nginx nginx;

# 工作进程数(建议设置为CPU核心数)
worker_processes auto;

# 错误日志
error_log /var/log/nginx/error.log warn;

# PID文件
pid /var/run/nginx.pid;

# 工作进程最大打开文件数
worker_rlimit_nofile 65535;

events {
# 使用epoll事件模型
use epoll;

# 单个工作进程最大连接数
worker_connections 65535;

# 允许同时接受多个连接
multi_accept on;
}

http {
# MIME类型
include /etc/nginx/mime.types;
default_type application/octet-stream;

# 日志格式
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'$request_time $upstream_response_time';

# 访问日志
access_log /var/log/nginx/access.log main;

# 高效文件传输
sendfile on;
tcp_nopush on;
tcp_nodelay on;

# 连接超时时间
keepalive_timeout 65;
keepalive_requests 100;

# 客户端请求头超时
client_header_timeout 10;

# 客户端请求体超时
client_body_timeout 10;

# 发送响应超时
send_timeout 10;

# 客户端请求体大小限制
client_max_body_size 50m;
client_body_buffer_size 128k;

# Gzip压缩
gzip on;
gzip_vary on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_http_version 1.1;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript
application/json application/javascript application/xml+rss
application/rss+xml font/truetype font/opentype
application/vnd.ms-fontobject image/svg+xml;
gzip_proxied any;
gzip_disable "msie6";

# 隐藏Nginx版本号
server_tokens off;

# 限制请求方法
limit_req_zone $binary_remote_addr zone=req_limit:10m rate=10r/s;
limit_conn_zone $binary_remote_addr zone=conn_limit:10m;

# 上游服务器配置
upstream backend_servers {
# 负载均衡策略:ip_hash/least_conn/weight
least_conn;

# 后端服务器
server 192.168.1.101:8080 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.1.102:8080 weight=3 max_fails=3 fail_timeout=30s;
server 192.168.1.103:8080 weight=2 max_fails=3 fail_timeout=30s backup;

# 保持连接
keepalive 32;
keepalive_timeout 60s;
keepalive_requests 100;
}

# 缓存配置
proxy_cache_path /var/cache/nginx/proxy
levels=1:2
keys_zone=proxy_cache:100m
max_size=10g
inactive=60m
use_temp_path=off;

# 包含其他配置文件
include /etc/nginx/conf.d/*.conf;
}

2.3 虚拟主机配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# /etc/nginx/conf.d/default.conf
# @author 运维实战

server {
listen 80;
listen [::]:80;
server_name example.com www.example.com;

# 强制HTTPS
return 301 https://$server_name$request_uri;
}

server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name example.com www.example.com;

# 网站根目录
root /var/www/html;
index index.html index.htm;

# SSL证书配置
ssl_certificate /etc/nginx/ssl/example.com.crt;
ssl_certificate_key /etc/nginx/ssl/example.com.key;

# SSL协议和加密套件
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';
ssl_prefer_server_ciphers on;

# SSL会话缓存
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_session_tickets off;

# HSTS
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

# 安全头
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "no-referrer-when-downgrade" always;

# 访问日志
access_log /var/log/nginx/example.com.access.log main;
error_log /var/log/nginx/example.com.error.log warn;

# 限流限连
limit_req zone=req_limit burst=20 nodelay;
limit_conn conn_limit 10;

# 静态文件缓存
location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
expires 7d;
add_header Cache-Control "public, immutable";
access_log off;
}

# API反向代理
location /api/ {
# 代理到后端服务
proxy_pass http://backend_servers/;

# 代理头设置
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# 代理超时设置
proxy_connect_timeout 30s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;

# 代理缓冲设置
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;

# 代理缓存设置
proxy_cache proxy_cache;
proxy_cache_valid 200 304 10m;
proxy_cache_valid 404 1m;
proxy_cache_key "$scheme$request_method$host$request_uri";
proxy_cache_bypass $http_pragma $http_authorization;
proxy_no_cache $http_pragma $http_authorization;

# 添加缓存状态头
add_header X-Cache-Status $upstream_cache_status;

# WebSocket支持
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}

# PHP-FPM配置(如果需要)
location ~ \.php$ {
try_files $uri =404;
fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}

# 禁止访问隐藏文件
location ~ /\. {
deny all;
access_log off;
log_not_found off;
}

# Nginx状态监控(仅限内网)
location /nginx_status {
stub_status on;
access_log off;
allow 192.168.1.0/24;
deny all;
}
}

3. Nginx负载均衡配置

3.1 多种负载均衡策略

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# /etc/nginx/conf.d/load-balance.conf
# @author 运维实战

# 1. 轮询(默认)
upstream backend_rr {
server 192.168.1.101:8080;
server 192.168.1.102:8080;
server 192.168.1.103:8080;
}

# 2. 权重负载均衡
upstream backend_weight {
server 192.168.1.101:8080 weight=5;
server 192.168.1.102:8080 weight=3;
server 192.168.1.103:8080 weight=2;
}

# 3. IP哈希(同一客户端固定到同一服务器)
upstream backend_ip_hash {
ip_hash;
server 192.168.1.101:8080;
server 192.168.1.102:8080;
server 192.168.1.103:8080;
}

# 4. 最少连接
upstream backend_least_conn {
least_conn;
server 192.168.1.101:8080;
server 192.168.1.102:8080;
server 192.168.1.103:8080;
}

# 5. 一致性哈希(需要第三方模块)
upstream backend_hash {
hash $request_uri consistent;
server 192.168.1.101:8080;
server 192.168.1.102:8080;
server 192.168.1.103:8080;
}

# 6. 高级配置(健康检查、备份服务器)
upstream backend_advanced {
least_conn;

# 主服务器
server 192.168.1.101:8080 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.1.102:8080 weight=3 max_fails=3 fail_timeout=30s;

# 备份服务器(仅当主服务器全部失败时使用)
server 192.168.1.103:8080 backup;

# 下线服务器(不参与负载均衡)
# server 192.168.1.104:8080 down;

# 保持连接
keepalive 32;
keepalive_timeout 60s;
keepalive_requests 100;
}

# 负载均衡虚拟主机
server {
listen 80;
server_name lb.example.com;

location / {
proxy_pass http://backend_advanced;

# 代理头设置
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# 健康检查(需要第三方模块nginx_upstream_check_module)
# check interval=3000 rise=2 fall=3 timeout=1000 type=http;
# check_http_send "GET /health HTTP/1.0\r\n\r\n";
# check_http_expect_alive http_2xx http_3xx;
}

# 健康检查状态页面
# location /upstream_status {
# check_status;
# access_log off;
# allow 192.168.1.0/24;
# deny all;
# }
}

3.2 动态负载均衡(Nginx Plus或第三方模块)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# /etc/nginx/conf.d/dynamic-lb.conf
# @author 运维实战

# 动态上游服务器组(需要Nginx Plus或lua-nginx-module)
upstream backend_dynamic {
zone backend_dynamic 64k;

# 初始服务器列表
server 192.168.1.101:8080;
server 192.168.1.102:8080;
}

server {
listen 80;
server_name dynamic-lb.example.com;

location / {
proxy_pass http://backend_dynamic;

# 健康检查
health_check interval=5s fails=3 passes=2 uri=/health;
}

# API端点(添加/删除服务器)
location /api {
api write=on;
allow 192.168.1.0/24;
deny all;
}

# 仪表板
location = /dashboard.html {
root /usr/share/nginx/html;
}
}

4. Nginx监控与管理

4.1 Nginx监控服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
/**
* Nginx监控服务
* @author 运维实战
*/
@Service
@Slf4j
public class NginxMonitorService {

@Value("${nginx.status.url}")
private String nginxStatusUrl;

@Autowired
private RestTemplate restTemplate;

/**
* 监控Nginx状态
*/
@Scheduled(fixedRate = 30000)
public void monitorNginxStatus() {
try {
// 获取Nginx状态
String statusResponse = restTemplate.getForObject(nginxStatusUrl, String.class);

// 解析状态信息
NginxStatus status = parseNginxStatus(statusResponse);

log.info("Nginx状态: 活动连接={}, 接受={}, 处理={}, 请求={}",
status.getActiveConnections(),
status.getAccepts(),
status.getHandled(),
status.getRequests());

// 记录指标
recordMetrics("nginx.connections.active", status.getActiveConnections());
recordMetrics("nginx.connections.accepts", status.getAccepts());
recordMetrics("nginx.connections.handled", status.getHandled());
recordMetrics("nginx.requests.total", status.getRequests());
recordMetrics("nginx.connections.reading", status.getReading());
recordMetrics("nginx.connections.writing", status.getWriting());
recordMetrics("nginx.connections.waiting", status.getWaiting());

// 检查异常情况
checkNginxHealth(status);

} catch (Exception e) {
log.error("监控Nginx状态失败", e);
sendAlert("Nginx监控失败", "critical");
}
}

/**
* 解析Nginx状态
*/
private NginxStatus parseNginxStatus(String response) {
NginxStatus status = new NginxStatus();

// 示例响应:
// Active connections: 291
// server accepts handled requests
// 16630948 16630948 31070465
// Reading: 6 Writing: 179 Waiting: 106

String[] lines = response.split("\n");

// 解析活动连接数
if (lines.length > 0) {
String activeConn = lines[0].replaceAll("Active connections: ", "").trim();
status.setActiveConnections(Long.parseLong(activeConn));
}

// 解析accepts/handled/requests
if (lines.length > 2) {
String[] stats = lines[2].trim().split("\\s+");
if (stats.length >= 3) {
status.setAccepts(Long.parseLong(stats[0]));
status.setHandled(Long.parseLong(stats[1]));
status.setRequests(Long.parseLong(stats[2]));
}
}

// 解析reading/writing/waiting
if (lines.length > 3) {
String line = lines[3];
status.setReading(extractNumber(line, "Reading:"));
status.setWriting(extractNumber(line, "Writing:"));
status.setWaiting(extractNumber(line, "Waiting:"));
}

return status;
}

/**
* 提取数字
*/
private long extractNumber(String text, String prefix) {
int start = text.indexOf(prefix);
if (start == -1) return 0;

start += prefix.length();
int end = text.indexOf(" ", start);
if (end == -1) end = text.length();

String number = text.substring(start, end).trim();
return Long.parseLong(number);
}

/**
* 检查Nginx健康状态
*/
private void checkNginxHealth(NginxStatus status) {
// 检查连接处理率
if (status.getAccepts() > 0 && status.getHandled() > 0) {
double handleRate = (double) status.getHandled() / status.getAccepts();
if (handleRate < 0.95) {
log.warn("Nginx连接处理率过低: {}%", handleRate * 100);
sendAlert("Nginx连接处理率过低", "warning");
}
}

// 检查活动连接数
if (status.getActiveConnections() > 10000) {
log.warn("Nginx活动连接数过多: {}", status.getActiveConnections());
sendAlert("Nginx活动连接数过多", "warning");
}

// 检查等待连接数
if (status.getWaiting() > 1000) {
log.warn("Nginx等待连接数过多: {}", status.getWaiting());
sendAlert("Nginx等待连接数过多", "warning");
}
}

/**
* 监控Nginx日志
*/
@Scheduled(fixedRate = 60000)
public void monitorNginxLogs() {
try {
// 分析访问日志
analyzeAccessLog();

// 分析错误日志
analyzeErrorLog();

} catch (Exception e) {
log.error("监控Nginx日志失败", e);
}
}

/**
* 分析访问日志
*/
private void analyzeAccessLog() {
try {
String logFile = "/var/log/nginx/access.log";

// 统计最近1分钟的日志
LocalDateTime oneMinuteAgo = LocalDateTime.now().minusMinutes(1);

Map<Integer, Long> statusCodes = new HashMap<>();
long totalRequests = 0;
double totalResponseTime = 0;

// 读取日志文件(实际应该使用tail或专门的日志分析工具)
try (BufferedReader reader = new BufferedReader(new FileReader(logFile))) {
String line;
while ((line = reader.readLine()) != null) {
// 解析日志行
AccessLogEntry entry = parseAccessLogEntry(line);

if (entry != null && entry.getTimestamp().isAfter(oneMinuteAgo)) {
totalRequests++;
totalResponseTime += entry.getResponseTime();

statusCodes.merge(entry.getStatusCode(), 1L, Long::sum);
}
}
}

// 记录指标
if (totalRequests > 0) {
double avgResponseTime = totalResponseTime / totalRequests;
recordMetrics("nginx.requests.rate", totalRequests);
recordMetrics("nginx.response.time.avg", avgResponseTime);

// 记录各状态码数量
statusCodes.forEach((code, count) -> {
recordMetrics("nginx.status." + code, count);
});

// 检查错误率
long errorCount = statusCodes.entrySet().stream()
.filter(e -> e.getKey() >= 500)
.mapToLong(Map.Entry::getValue)
.sum();

double errorRate = (double) errorCount / totalRequests;
if (errorRate > 0.05) {
log.error("Nginx错误率过高: {}%", errorRate * 100);
sendAlert("Nginx错误率过高", "critical");
}
}

} catch (Exception e) {
log.error("分析访问日志失败", e);
}
}

/**
* 解析访问日志条目
*/
private AccessLogEntry parseAccessLogEntry(String line) {
// 实现日志解析逻辑
return null;
}

/**
* 分析错误日志
*/
private void analyzeErrorLog() {
try {
String logFile = "/var/log/nginx/error.log";

// 统计最近1分钟的错误
LocalDateTime oneMinuteAgo = LocalDateTime.now().minusMinutes(1);

List<String> criticalErrors = new ArrayList<>();

// 读取错误日志
try (BufferedReader reader = new BufferedReader(new FileReader(logFile))) {
String line;
while ((line = reader.readLine()) != null) {
// 检查是否为最近的错误
if (line.contains("[crit]") || line.contains("[alert]") || line.contains("[emerg]")) {
criticalErrors.add(line);
}
}
}

// 发送告警
if (!criticalErrors.isEmpty()) {
log.error("发现Nginx严重错误,共{}条", criticalErrors.size());
sendAlert("Nginx严重错误: " + criticalErrors.size() + "条", "critical");
}

} catch (Exception e) {
log.error("分析错误日志失败", e);
}
}

/**
* 记录指标
*/
private void recordMetrics(String metricName, Number value) {
// 实现指标记录逻辑(Prometheus/InfluxDB等)
log.debug("记录指标: {}={}", metricName, value);
}

/**
* 发送告警
*/
private void sendAlert(String message, String level) {
// 实现告警发送逻辑
log.info("发送告警: message={}, level={}", message, level);
}
}

4.2 Nginx管理服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
/**
* Nginx管理服务
* @author 运维实战
*/
@Service
@Slf4j
public class NginxManagementService {

private static final String NGINX_CONF_DIR = "/etc/nginx";
private static final String NGINX_BIN = "/usr/local/nginx/sbin/nginx";

/**
* 重载Nginx配置
*/
public boolean reloadNginx() {
try {
log.info("开始重载Nginx配置");

// 测试配置
if (!testNginxConfig()) {
log.error("Nginx配置测试失败");
return false;
}

// 重载配置
ProcessBuilder pb = new ProcessBuilder(NGINX_BIN, "-s", "reload");
Process process = pb.start();
int exitCode = process.waitFor();

if (exitCode == 0) {
log.info("Nginx配置重载成功");
return true;
} else {
log.error("Nginx配置重载失败,退出码: {}", exitCode);
return false;
}

} catch (Exception e) {
log.error("重载Nginx配置失败", e);
return false;
}
}

/**
* 测试Nginx配置
*/
public boolean testNginxConfig() {
try {
ProcessBuilder pb = new ProcessBuilder(NGINX_BIN, "-t");
pb.redirectErrorStream(true);
Process process = pb.start();

// 读取输出
StringBuilder output = new StringBuilder();
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(process.getInputStream()))) {
String line;
while ((line = reader.readLine()) != null) {
output.append(line).append("\n");
}
}

int exitCode = process.waitFor();

if (exitCode == 0) {
log.info("Nginx配置测试通过");
return true;
} else {
log.error("Nginx配置测试失败: {}", output.toString());
return false;
}

} catch (Exception e) {
log.error("测试Nginx配置失败", e);
return false;
}
}

/**
* 备份Nginx配置
*/
@Scheduled(cron = "0 0 2 * * ?")
public void backupNginxConfig() {
try {
log.info("开始备份Nginx配置");

String backupDir = "/opt/backup/nginx/" +
LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd"));

// 创建备份目录
Files.createDirectories(Paths.get(backupDir));

// 备份配置文件
String backupFile = backupDir + "/nginx-config.tar.gz";

ProcessBuilder pb = new ProcessBuilder(
"tar", "-czf", backupFile,
"-C", NGINX_CONF_DIR,
"."
);
Process process = pb.start();
int exitCode = process.waitFor();

if (exitCode == 0) {
log.info("Nginx配置备份完成: {}", backupFile);

// 删除30天前的备份
cleanOldBackups("/opt/backup/nginx", 30);
} else {
log.error("Nginx配置备份失败");
}

} catch (Exception e) {
log.error("备份Nginx配置失败", e);
}
}

/**
* 清理旧备份
*/
private void cleanOldBackups(String backupDir, int days) {
try {
LocalDateTime cutoffDate = LocalDateTime.now().minusDays(days);

Files.walk(Paths.get(backupDir))
.filter(Files::isDirectory)
.filter(path -> {
try {
FileTime lastModified = Files.getLastModifiedTime(path);
LocalDateTime modifiedTime = LocalDateTime.ofInstant(
lastModified.toInstant(),
ZoneId.systemDefault()
);
return modifiedTime.isBefore(cutoffDate);
} catch (IOException e) {
return false;
}
})
.forEach(path -> {
try {
Files.walk(path)
.sorted(Comparator.reverseOrder())
.forEach(p -> {
try {
Files.delete(p);
} catch (IOException e) {
log.error("删除文件失败: {}", p, e);
}
});
} catch (IOException e) {
log.error("清理备份目录失败: {}", path, e);
}
});

log.info("清理旧备份完成");

} catch (Exception e) {
log.error("清理旧备份失败", e);
}
}

/**
* 更新上游服务器
*/
public boolean updateUpstreamServers(String upstreamName, List<String> servers) {
try {
log.info("更新上游服务器: upstream={}, servers={}", upstreamName, servers);

// 读取配置文件
String confFile = NGINX_CONF_DIR + "/conf.d/upstream.conf";
String content = new String(Files.readAllBytes(Paths.get(confFile)));

// 构建新的upstream配置
StringBuilder newUpstream = new StringBuilder();
newUpstream.append("upstream ").append(upstreamName).append(" {\n");
newUpstream.append(" least_conn;\n");

for (String server : servers) {
newUpstream.append(" server ").append(server)
.append(" weight=1 max_fails=3 fail_timeout=30s;\n");
}

newUpstream.append(" keepalive 32;\n");
newUpstream.append("}\n");

// 替换配置(简化版,实际应该使用正则表达式)
String pattern = "upstream " + upstreamName + " \\{[^}]+\\}";
content = content.replaceAll(pattern, newUpstream.toString());

// 写入配置文件
Files.write(Paths.get(confFile), content.getBytes());

// 测试并重载配置
if (testNginxConfig()) {
return reloadNginx();
} else {
log.error("更新上游服务器失败:配置测试不通过");
return false;
}

} catch (Exception e) {
log.error("更新上游服务器失败", e);
return false;
}
}
}

5. Nginx性能优化

5.1 系统级优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#!/bin/bash
# Nginx系统级性能优化脚本
# @author 运维实战

# 1. 调整系统参数
cat >> /etc/sysctl.conf << 'EOF'
# 网络优化
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_max_tw_buckets = 10000
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 300
net.ipv4.ip_local_port_range = 1024 65535

# 文件描述符
fs.file-max = 2097152
EOF

# 应用配置
sysctl -p

# 2. 调整文件描述符限制
cat >> /etc/security/limits.conf << 'EOF'
* soft nofile 1024000
* hard nofile 1024000
* soft nproc 1024000
* hard nproc 1024000
EOF

# 3. Nginx用户限制
cat >> /etc/security/limits.d/nginx.conf << 'EOF'
nginx soft nofile 1024000
nginx hard nofile 1024000
EOF

echo "系统优化完成"

5.2 Nginx性能优化配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# nginx-performance.conf
# @author 运维实战

# 工作进程优化
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 1024000;

events {
use epoll;
worker_connections 102400;
multi_accept on;
accept_mutex off;
}

http {
# 文件传输优化
sendfile on;
tcp_nopush on;
tcp_nodelay on;

# 连接优化
keepalive_timeout 65;
keepalive_requests 10000;

# 缓冲区优化
client_header_buffer_size 4k;
large_client_header_buffers 4 32k;
client_body_buffer_size 128k;
client_max_body_size 50m;

# 超时优化
client_header_timeout 30s;
client_body_timeout 30s;
send_timeout 30s;

# Gzip压缩优化
gzip on;
gzip_vary on;
gzip_min_length 1k;
gzip_buffers 16 8k;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript
application/json application/javascript application/xml+rss;

# 缓存优化
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;

# 代理优化
proxy_buffering on;
proxy_buffer_size 8k;
proxy_buffers 32 8k;
proxy_busy_buffers_size 16k;
proxy_temp_file_write_size 16k;

# 上游连接优化
upstream backend {
least_conn;

server 192.168.1.101:8080 weight=5;
server 192.168.1.102:8080 weight=3;

keepalive 300;
keepalive_timeout 60s;
keepalive_requests 10000;
}
}

6. Nginx运维自动化脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
#!/bin/bash
# Nginx运维自动化脚本
# @author 运维实战

NGINX_BIN="/usr/local/nginx/sbin/nginx"
NGINX_CONF="/etc/nginx/nginx.conf"
NGINX_LOG="/var/log/nginx"
BACKUP_DIR="/opt/backup/nginx"

# 函数:检查Nginx状态
check_nginx() {
if pgrep -x "nginx" > /dev/null; then
echo "Nginx运行正常"
return 0
else
echo "Nginx未运行"
return 1
fi
}

# 函数:启动Nginx
start_nginx() {
echo "启动Nginx..."
$NGINX_BIN
sleep 2
check_nginx
}

# 函数:停止Nginx
stop_nginx() {
echo "停止Nginx..."
$NGINX_BIN -s stop
sleep 2
}

# 函数:重启Nginx
restart_nginx() {
echo "重启Nginx..."
stop_nginx
start_nginx
}

# 函数:重载配置
reload_nginx() {
echo "重载Nginx配置..."
$NGINX_BIN -t && $NGINX_BIN -s reload
}

# 函数:日志切割
rotate_logs() {
echo "切割Nginx日志..."

DATE=$(date -d "yesterday" +%Y%m%d)

# 移动日志
mv ${NGINX_LOG}/access.log ${NGINX_LOG}/access.log.${DATE}
mv ${NGINX_LOG}/error.log ${NGINX_LOG}/error.log.${DATE}

# 重新打开日志文件
$NGINX_BIN -s reopen

# 压缩旧日志
gzip ${NGINX_LOG}/access.log.${DATE}
gzip ${NGINX_LOG}/error.log.${DATE}

# 删除30天前的日志
find ${NGINX_LOG} -name "*.gz" -mtime +30 -delete

echo "日志切割完成"
}

# 函数:监控Nginx
monitor_nginx() {
echo "监控Nginx..."

# 检查进程
if ! check_nginx; then
echo "Nginx进程异常,尝试启动..."
start_nginx
send_alert "Nginx进程异常已重启"
fi

# 检查端口
if ! netstat -tuln | grep -q ":80\|:443"; then
echo "Nginx端口异常"
send_alert "Nginx端口异常"
fi

# 检查错误日志
ERROR_COUNT=$(tail -n 100 ${NGINX_LOG}/error.log | grep -c "\[error\]")
if [ $ERROR_COUNT -gt 10 ]; then
echo "Nginx错误日志过多: $ERROR_COUNT"
send_alert "Nginx错误日志过多: $ERROR_COUNT"
fi
}

# 函数:发送告警
send_alert() {
MESSAGE=$1
echo "发送告警: $MESSAGE"
# 实现告警逻辑(钉钉/邮件等)
}

# 主函数
main() {
case "$1" in
start)
start_nginx
;;
stop)
stop_nginx
;;
restart)
restart_nginx
;;
reload)
reload_nginx
;;
status)
check_nginx
;;
rotate)
rotate_logs
;;
monitor)
monitor_nginx
;;
*)
echo "用法: $0 {start|stop|restart|reload|status|rotate|monitor}"
exit 1
;;
esac
}

main "$@"

7. 总结

Nginx作为高性能的Web服务器和反向代理服务器,在生产环境中需要专业的运维监控和管理。通过本文的详细介绍,我们了解了:

  1. Nginx安装配置: 编译安装、主配置、虚拟主机配置
  2. 负载均衡: 多种负载均衡策略、健康检查、动态配置
  3. 监控管理: 状态监控、日志分析、配置管理
  4. 性能优化: 系统级优化、Nginx配置优化
  5. 运维自动化: 自动化脚本、日志切割、监控告警

通过合理的Nginx运维配置和管理,可以有效提升系统性能和稳定性,为业务提供高可用的Web服务保障。


运维实战要点:

  • Nginx配置需要根据业务场景进行调优
  • 负载均衡策略要选择合适的算法
  • SSL配置要使用最新的安全协议
  • 监控要覆盖状态、日志、性能等多个维度
  • 运维自动化可以提升管理效率

技术注解:

  • Nginx使用epoll事件模型实现高并发
  • 支持多种负载均衡算法和健康检查
  • SSL/TLS优化可以提升HTTPS性能
  • 缓存和压缩可以减少带宽消耗
  • 日志分析可以发现潜在问题