[推荐] 用CW代理监控ec2的内存和磁盘指标
上一篇介绍了如何在ec2上下载脚本,通过运行两个perl文件来监控特定ec2的内存占用情况,今天和大家一起学习下如何用官方推荐的方式-安装cloudwatch代理(以下简称CW)来实现对 ec2 内存和磁盘占用的监(窥)控(探)。
-- D.C
老规矩:给EC2创建IAM Role
- 进入IAM界面,点击左侧的 Roles, 再点击右侧窗口的 Create role, use case 选择 EC2, 点击 Next:Permissions, 搜索并选中 CloudWatchAgentServerPolicy, 一路点下一步,命名为 CloudWatchAgentServerRole。
- 选中对应的EC2,点击 Actions - Instance Settings - Attach/Replace IAM Role, 选择 CloudWatchAgentServerRole,点击 Apply。
- 这里提一句:如果我们原本用iam role来控制ec2对s3的访问,且没有在ec2上设置AKSK信息,那么替换掉role之后,ec2访问s3就会有问题。
解决方法:
方案1. 在角色 CloudWatchAgentServerRole 上再加attach一条Policy如 AmazonS3ReadOnlyAccess
或 AmazonS3FullAccess
就可以了。
方案2. 在ec2上设置aws configure
的全部信息(包括AKSK信息,不推荐)。
登录ec2安装CW代理
- 登录ec2,下载CW代理包(57M)。
Linux:
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
如果下载速度太慢可以wget这个链接:
https://publicuse.s3.cn-northwest-1.amazonaws.com.cn/amazon-cloudwatch-agent.rpm
或者
Baidu云盘下载再上传到ec2。
其他操作系统代理链接:
Centos:
https://s3.amazonaws.com/amazoncloudwatch-agent/centos/amd64/latest/amazon-cloudwatch-agent.rpm
Ubuntu:
https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
- 安装代理包并运行监控命令,只要没有error就成。
$ sudo rpm -U ./amazon-cloudwatch-agent.rpm
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source default --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2020/02/06 08:16:03 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default.tmp ...
Valid Json input schema.
I! Detecting runasuser...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
Created symlink from /etc/systemd/system/multi-user.target.wants/amazon-cloudwatch-agent.service to /etc/systemd/system/amazon-cloudwatch-agent.service.
Redirecting to /bin/systemctl restart amazon-cloudwatch-agent.service
关于amazon-cloudwatch-agent-ctl脚本使用说明:
usage: amazon-cloudwatch-agent-ctl -a stop|start|status|fetch-config|append-config|remove-config [-m ec2|onPremise|auto] [-c default|ssm:<parameter-store-name>|file:<file-path>] [-s]
e.g.
1. apply a SSM parameter store config on EC2 instance and restart the agent afterwards:
amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:AmazonCloudWatch-Config.json -s
2. append a local json config file on onPremise host and restart the agent afterwards:
amazon-cloudwatch-agent-ctl -a append-config -m onPremise -c file:/tmp/config.json -s
3. query agent status:
amazon-cloudwatch-agent-ctl -a status
-a: action
stop: stop the agent process.
start: start the agent process.
status: get the status of the agent process.
fetch-config: use this json config as the agent's only configuration.
append-config: append json config with the existing json configs if any.
remove-config: remove json config based on the location (ssm parameter store name, file name)
-m: mode
ec2: indicate this is on ec2 host.
onPremise: indicate this is on onPremise host.
auto: use ec2 metadata to determine the environment, may not be accurate if ec2 metadata is not available for some reason on EC2.
-c: configuration
default: default configuration for quick trial.
ssm:<parameter-store-name>: ssm parameter store name
file:<file-path>: file path on the host
-s: optionally restart after configuring the agent configuration
this parameter is used for 'fetch-config', 'append-config', 'remove-config' action only.
- vim新建一个测试的bash脚本
testmem.sh
,消耗1g内存1个小时,后面会用到。
测试内存小脚本:
#!/bin/bash
mkdir /tmp/memory
mount -t tmpfs -o size=1024M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block
sleep 3600
rm /tmp/memory/block
umount /tmp/memory
rmdir /tmp/memory
进入CW查看监控
- 在aws主页面上方的 Service 的搜索框内搜 CloudWatch,进到CW界面以后点击左侧的 Metrics。
-
[选项1] 可以看到右侧界面出现了 CWAgent,点击进去后,选择择
ImageId,InstanceId, InstanceTypeGraph search
,可以看到监控内存的ec2列表。选择ImageId, InstanceId, InstanceType, device, fstype, path
,可以看到监控磁盘的ec2列表。 -
[选项2] 将我们要监控的 ec2 的 id 输入到搜索框,回车,也能快速找到和这台机器有关的监控指标,点击
CWAgent > ImageId, InstanceId, InstanceType
,可以看到监控内存的ec2列表。选择CWAgent >ImageId, InstanceId, InstanceType, device, fstype, path
,可以看到监控磁盘的ec2列表。 -
选中对应的ec2,我们就能从上面的图中看到内存/磁盘监控的动态曲线了。
-
这个时候回到ec2,运行测试脚本
testmem.sh
,再回到 CW 查看内存监控的变化。
$ sudo bash testmem.sh &
[1] 3337
2097153+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 2.78481 s, 386 MB/s
坚持很重要,若非不得已,不要放弃!