10分钟一键创建HPC集群-pcluster (parallelcluster)
brief introduction of pcluster on aws
-- D.C
俗话说的好,没图说个基霸
好话说前头
-
对生信来说,第一步不是往下看,而是找到正确的AMI, 这个镜像应该是预装好了pcluster相关必要组件的,所以这一步 至关重要 。
-
pcluster的版本(目前是2.7.0)对应不同的ami,所以也要区分清楚,在创建ec2界面的第一步,我们可以搜索
parallelcluster-2.7.0
,目前的记录显示有41个相关的ami,根据你想要的操作系统记下对应的ami ID。不同region的ami id也不同。
举例:
pcluster 版本 | 系统 | 宁夏区ami | 北京区ami |
---|---|---|---|
2.5.1 | ubuntu | ami-0202652c7cb199eb6 | ami-00881ffc995032786 |
2.5.1 | linux | ami-05038e0c41061b799 | ami-0bbadb6ff64415ab3 |
2.5.1 | ubuntu | ami-0b0ebbfcd0c50f225 | ami-0d3a6e7dd85085042 |
- 本文会涉及到多台机器:
名称 | 说明 |
---|---|
模板机 | 从官方的pcluster镜像启动,安装用户自己的软件后,打包成分析镜像AMI |
控制机 | 不需要从镜像启动,用于安装pcluster命令,用于控制集群 |
主节点 | 用于调度集群作业的节点 |
计算节点 | 用于分析计算的节点 |
[必读] 创建分析镜像
-
开一台ec2作为 模板机 (如t2.micro),选择从parallelcluster的官方ami启动 [如aws-parallelcluster-2.7.0-amzn-hvm-202005172030 (ami-0c7a09bc17088086c)]
-
将我的分析流程部署在这台模板机上,再基于 这台ec2 打一个新的ami,拿到这个新的ami id (如ami-xxxxxxxxxxxxxx),这个AMI就可以设置为pcluster集群的启动ami,对应下面template里面的custom_ami。
-
请注意,使用parallelcluster一键集群的功能,需要保证 控制服务器安装的parallelcluster版本 和 部署分析流程(的模板机)选择ami的parallelcluster版本 都是一一对应的。如都是2.7.0版本。
-
所以流程大致是:
原始pcluster官方镜像(2.7版本,ami-0c7a09bc17088086c) --> 启动模板机 --> 安装分析流程 -->打镜像sanpshot --> 得到custom_ami
控制机(默认Amazon Linux 2 AMI) --> 安装pcluster软件2.7版本 --> 配置config --> 运行命令启动集群
打开一台虚拟机作为Pcluster的 控制服务器 (v2.7.0)
- 选择一个ec2配置
eg. t2.micro
系统不限,默认选择amazon linux - 记住我的安全秘钥名称
eg. mykey.pem
- 登录这台机器
安装pcluster 控制软件
- 安装python3,pip3,pcluster
# 安装python3 依赖包
sudo yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
sudo yum install gcc -y
# 安装python3 和 pip3
sudo yum -y install python3
python3 -m pip install --upgrade pip --user
# 设置清华镜像
pip3 install pip -U
pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# 安装parallelcluster
pip3 install aws-parallelcluster --upgrade --user #默认安装最新版pcluster
pcluster version
TIPS:
万一装错了pcluster版本咋办:
pip3 uninstall aws-parallelcluster # 删除老的
pip3 install aws-parallelcluster==2.7.0 # 安装指定版本
- 配置aws秘钥文件
$ aws configure
AWS Access Key ID [None]: AKXXXXXXXXXXXXXXXX
AWS Secret Access Key [None]: wJalrXUtnFEMI/KXXXXXXXXXXXXXXXXX
Default region name [us-east-1]: cn-northwest-1
Default output format [None]: json
# 确认下aksk秘钥生效(能摸到S3上数据)
$ aws s3 ls
2020-01-29 11:10:27 lovevideo
生成并配置集群的config文件, 一键启动pcluster
cd ~/.parallelcluster
vim config
将下面的配置文件拷贝进去并按要求修改
[config template]
[aws]
aws_region_name = cn-northwest-1 # change if you want
[global]
cluster_template = myname # change if you want, MUST remember this name!
update_check = true
sanity_check = true
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
[cluster myname] # change if you changed the name of cluster_template in [global] settings above
key_name = mytest # change to your keypair name
master_instance_type = c5.2xlarge # master node type
compute_instance_type = c5.large # compute node type
pre_install = https://awshcls.s3.cn-northwest-1.amazonaws.com.cn/efsfix.sh # 中国区用EFS加上这句
pre_install_args = NONE # 中国区用EFS加上这句
initial_queue_size = 1 # number of compute nodes when this cluster be created
max_queue_size = 30 # maximum number of compute nodes
maintain_initial_size = true # will remain 1 compute node even no job submitted
master_root_volume_size = 25 # master's root disk volumn, 17G by default
compute_root_volume_size = 25 # compute's root disk volumn, 17G by default
cluster_type = spot # ondemand/spot
spot_price = 0.4 # change if you want use spot as compute nodes, get latest price of specific instance in your EC2 console
base_os = alinux2 # change if you did not choose amazon linux as your template ami OS
scheduler = sge # 设定调度引擎,sge,torque,slurm, awsbatch
custom_ami = ami-xxxxxxxxxxxxxx # ami-0c7a09bc17088086c 2.7.0; ami-0c081e1551e30ee5a 2.6.1 ; change to your customized AMI based on pcluster ami
s3_read_resource = NONE
s3_read_write_resource = NONE
placement = compute
vpc_settings = default
#ebs_settings = custom1, custom2 # use EBS to be shared as NFS
efs_settings = custom1 # use EFS to be shared [recommmend]
[vpc default]
vpc_id = vpc-6b111111 # change to your vpc id
master_subnet_id = subnet-a23v24c # change to your subnet id
#compute_subnet_id = subnet-a23v24c # if you want to put compute in private subnet for security
#[ebs custom1] # change or add more if you want
#shared_dir = data1 # the dir will show in your master or compute nodes
#volume_type = gp2
#volume_size = 80 # GB
#[ebs custom2] # change or add more if you want
#shared_dir = data2
#volume_type = gp2
#volume_size = 200 # GB
[efs custom1]
shared_dir = myefs # change and will be mounted as /myefs in your master and computer node
encrypted = false
performance_mode = generalPurpose
[scaling custom]
scaledown_idletime = 10 # if idle time of compute nodes exceeds, it will be terminated for cost control (min)
- 启动集群命令
pcluster <create/delete> <cluster_template>
, 在上面的例子里,cluster_template = myname, 集群启动大约需等待10分钟左右。
$ pcluster create -c config myname
Beginning cluster creation for cluster: myname
Creating stack named: parallelcluster-myname
Status: parallelcluster-myname - CREATE_COMPLETE
MasterPublicIP: 161.111.111.111
ClusterUser: ec2-user
MasterPrivateIP: 172.11.11.11
- PS: 这台控制机在启动集群成功后,平时可以关闭,需要的时候在开启,节约费用。
登录主节点,确认集群功能
- 进到AWS console界面查看master和compute节点状态
-
ssh登录master节点
ssh -i 'xx.pem' ec2-user@192.111.11.11
-
如果是Slurm (推荐)
$ cat >> test.slurm<<EOF
#!/bin/bash
#SBATCH -J array
#SBATCH -p compute
#SBATCH -N 1
#SBATCH --cpus-per-task=1
#SBATCH -t 5:00
#SBATCH -a 0-2
input=(foo bar baz)
echo "This is job #\${SLURM_ARRAY_JOB_ID}, with parameter ${input[$SLURM_ARRAY_TASK_ID]}"
echo "There are \${SLURM_ARRAY_TASK_COUNT} task(s) in the array."
echo " Max index is \${SLURM_ARRAY_TASK_MAX}"
echo " Min index is \${SLURM_ARRAY_TASK_MIN}"
sleep 5
EOF
提交任务sbatch test.slurm
- 如果是PBS:
$ cat >> test.pbs<<EOF
#!/bin/bash
#PBS -l nodes=1:ppn=2
sleep 600
EOF
qsub test.pbs
qstat
- 如果是SGE:
$ cat >> test.sh<<EOF
#!/bin/bash
sleep 600
EOF
qsub -cwd -pe smp 2 -l vf=2G test.sh
qhost
to check your queue
df -h
to check your volumns
qsub test.sh
to check your cluster function
qstat -f
to see job status
submit your jobs using command like qsub -cwd -S /bin/bash -V -l vf=2G -pe smp 4 -o output -e output -q all.q yourscript.sh
关于共享存储
我们一般可以用EBS的gp2类型来组建NFS盘,可以和本地环境无缝对接了,但是对于高IOPS的应用场景,可能就会遇到瓶颈了,所幸Pcluster还为我们提供了其他的共享存储选项:
- EFS 弹性文件系统 (上面的例子用的就是EFS)
- FSx Lustre
以EFS为例,将EBS设置部分替换成如下:
efs_settings = customfs
[efs customfs]
shared_dir = efs
encrypted = false
performance_mode = generalPurpose
这样build出来的集群,EFS共享存储性能有保证,且容量弹性,按照实际占用空间计费,如果善于利用EFS的生命周期管理(去EFS页面设置),成本也能控制的不错。
注意: 如果想利用现有的EFS盘也可以(如下设置),但是要注意要提前删除这块EFS上所有的挂载目标(mount target),以前被坑过的就是,当我手动在EFS界面新建efs盘后,系统会默认为这块efs添加所在region的所有可用区的mount target,这样做无疑是为了以后使用方便,但是这样的盘是无法被pcluste利用的,推测其后台会在建集群的时候会分配一个对应可用区的mount target, 如果发现已经有了mount target就会卡在那里。
doc 是这么说的:
Specifying this option voids all other Amazon EFS options except for shared_dir. If you set this option to config_sanity, it only supports file systems:
That don't have a mount target in the stack's Availability Zone
OR
That do have an existing mount target in the stack's Availability Zone, with inbound and outbound NFS traffic allowed from 0.0.0.0/0.
删除mount target方法:EFS界面-点击对应的efs id - 右下角Network - Manage - 把每个可用区的mount target统统remove掉 - save
efs_settings = customfs
[efs customfs]
shared_dir = efs
efs_fs_id = fs-302c28d5
for debug
vi ~/.bashrc
- add
export SGE_ROOT=/opt/sge
- add
PATH=/opt/sge/bin:/opt/sge/bin/lx-amd64:/opt/amazon/openmpi/bin:$PATH
source ~/.bashrc
sudo /etc/init.d/sgemaster.p6444 <start/stop/restart>
for autoscaling 混搭[高阶]
wget https://awshcls.s3.cn-northwest-1.amazonaws.com.cn/pcluster/asgmodify.json
#修改其中ASG name、LaunchTemplateName和所需实例类型
vi asgmodify.json
aws autoscaling update-auto-scaling-group --cli-input-json file://asgmodify.json --profile zhy
行者无疆,干就是了。