1.简介
Ceph: 开源的分布式存储系统。主要分为对象存储、块设备存储、文件系统服务。Ceph核心组件包括:Ceph OSDs、Monitors、Managers、MDSs。Ceph存储集群至少需要一个Ceph Monitor,Ceph Manager和Ceph OSD(对象存储守护进程)。运行Ceph Filesystem客户端时也需要Ceph元数据服务器( Metadata Server )。
Ceph OSDs: Ceph OSD 守护进程(ceph-osd)的功能是存储数据,处理数据的复制、恢复、回填、再均衡,并通过检查其他 OSD 守护进程的心跳来向 Ceph Monitors 提供一些监控信息。冗余和高可用性通常至少需要3个Ceph OSD。当 Ceph 存储集群设定为有2个副本时,至少需要2个 OSD 守护进程,集群才能达到 active+clean 状态( Ceph 默认有3个副本,但你可以调整副本数)。
Monitors: Ceph Monitor(ceph-mon) 维护着展示集群状态的各种图表,包括监视器图、 OSD 图、归置组( PG )图、和 CRUSH 图。 Ceph 保存着发生在Monitors 、 OSD 和 PG上的每一次状态变更的历史信息(称为 epoch )。监视器还负责管理守护进程和客户端之间的身份验证。冗余和高可用性通常至少需要三个监视器。
Managers: Ceph Manager守护进程(ceph-mgr)负责跟踪运行时指标和Ceph集群的当前状态,包括存储利用率,当前性能指标和系统负载。Ceph Manager守护进程还托管基于python的插件来管理和公开Ceph集群信息,包括基于Web的Ceph Manager Dashboard和 REST API。高可用性通常至少需要两个管理器。
MDSs: Ceph 元数据服务器( MDS )为 Ceph 文件系统存储元数据(也就是说,Ceph 块设备和 Ceph 对象存储不使用MDS )。元数据服务器使得 POSIX 文件系统的用户们,可以在不对 Ceph 存储集群造成负担的前提下,执行诸如 ls、find 等基本命令。
官方文档地址:
2.准备
这里使用 ceph-deploy 来部署三节点的ceph集群,节点信息如下:
192.168.100.116 ceph01
192.168.100.117 ceph02
192.168.100.118 ceph03
所有操作均在 ceph01 节点进行操作。
A.配置hosts
# cat /etc/hosts192.168.100.116 ceph01192.168.100.117 ceph02192.168.100.118 ceph03
B.配置互信
# ssh-keygen -t rsa -P ''# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.100.117# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.100.118
C.安装ansible
# yum -y install ansible# cat /etc/ansible/hosts | grep -v ^# | grep -v ^$[node]192.168.100.117192.168.100.118
D.关闭SeLinux和Firewall
# ansible node -m copy -a 'src=/etc/hosts dest=/etc/'# sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config# ansible node -m copy -a 'src=/etc/selinux/config dest=/etc/selinux/'# systemctl stop firewalld# systemctl disable firewalld# ansible node -a 'systemctl stop firewalld'# ansible node -a 'systemctl disable firewalld'
E.安装NTP
# yum -y install ntp ntpdate ntp-doc# systemctl start ntpdate# systemctl start ntpd# systemctl enable ntpd ntpdate# ansible node -a 'yum -y install ntp ntpdate ntp-doc'# ansible node -a 'systemctl start ntpdate'# ansible node -a 'systemctl start ntpd'# ansible node -a 'systemctl enable ntpdate'# ansible node -a 'systemctl enable ntpd'
F.安装相应源
# wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo ###安装EPEL源# ansible node -m copy -a 'src=/etc/yum.repos.d/epel.repo dest=/etc/yum.repos.d/'# yum -y install yum-plugin-priorities# yum -y install snappy leveldb gdisk python-argparse gperftools-libs# rpm --import 'https://mirrors.aliyun.com/ceph/keys/release.asc'# vim /etc/yum.repos.d/ceph.repo ###安装阿里云的ceph源[Ceph]name=Ceph packages for $basearchbaseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/$basearchenabled=1gpgcheck=1priority=1gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc[Ceph-noarch]name=Ceph noarch packagesbaseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/noarchenabled=1gpgcheck=1priority=1gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc[ceph-source]name=Ceph source packagesbaseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/SRPMSenabled=1gpgcheck=1priority=1gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc# yum makecache
3.部署ceph集群
A.安装ceph-deploy
# yum -y install ceph-deploy
B.启动一个新的 ceph 集群(官方建议创建特定用户来部署集群)
# mkdir /etc/ceph && cd /etc/ceph# ceph-deploy new ceph{01,02,03}# lsceph.conf ceph-deploy-ceph.log ceph.log ceph.mon.keyring# ceph-deploy new --cluster-network 192.168.100.0/24 --public-network 192.168.100.0/24 ceph{01,02,03}
C.部署 mimic 版本的 ceph 集群并查看配置文件
# ceph-deploy install --release mimic ceph{01,02,03}# ceph --versionceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)# lsceph.conf ceph-deploy-ceph.log ceph.log ceph.mon.keyring rbdmap# cat ceph.conf [global]fsid = afd8b41f-f9fa-41da-85dc-e68a1612eba9public_network = 192.168.100.0/24cluster_network = 192.168.100.0/24mon_initial_members = ceph01, ceph02, ceph03mon_host = 192.168.100.116,192.168.100.117,192.168.100.118auth_cluster_required = cephxauth_service_required = cephxauth_client_required = ceph
D.激活监控节点
# ceph-deploy mon create-initial# lsceph.bootstrap-mds.keyring ceph.bootstrap-osd.keyring ceph.client.admin.keyring ceph-deploy-ceph.log ceph.mon.keyringceph.bootstrap-mgr.keyring ceph.bootstrap-rgw.keyring ceph.conf ceph.log rbdmap
E.查看健康状况
# ceph healthHEALTH_OK# ceph -s cluster: id: afd8b41f-f9fa-41da-85dc-e68a1612eba9 health: HEALTH_OK services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 mgr: no daemons active osd: 0 osds: 0 up, 0 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs:
F.将管理密钥拷贝到各个节点上
# ceph-deploy admin ceph{01,02,03}# ansible node -a 'ls /etc/ceph/'192.168.100.117 | SUCCESS | rc=0 >>ceph.client.admin.keyringceph.confrbdmaptmpUxAfBs192.168.100.118 | SUCCESS | rc=0 >>ceph.client.admin.keyringceph.confrbdmaptmp5Sx_4n
G.创建 ceph 管理进程服务
# ceph-deploy mgr create ceph{01,02,03}# ceph -s cluster: id: afd8b41f-f9fa-41da-85dc-e68a1612eba9 health: HEALTH_OK services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 mgr: ceph01(active), standbys: ceph02, ceph03 osd: 0 osds: 0 up, 0 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs:
H.启动 osd 创建数据
# ceph-deploy osd create --data /dev/sdb ceph01# ceph-deploy osd create --data /dev/sdb ceph02# ceph-deploy osd create --data /dev/sdb ceph03# ceph -s cluster: id: afd8b41f-f9fa-41da-85dc-e68a1612eba9 health: HEALTH_OK services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 mgr: ceph01(active), standbys: ceph02, ceph03 osd: 3 osds: 3 up, 3 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 3.0 GiB used, 57 GiB / 60 GiB avail pgs:
3.查看相关信息
A.查看运行状况
# systemctl status ceph-mon.target# systemctl status ceph/*.service# systemctl status | grep ceph● ceph01 │ └─4349 grep --color=auto ceph ├─system-ceph\x2dosd.slice │ └─ceph-osd@0.service │ └─4131 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph ├─system-ceph\x2dmgr.slice │ └─ceph-mgr@ceph01.service │ └─3715 /usr/bin/ceph-mgr -f --cluster ceph --id ceph01 --setuser ceph --setgroup ceph ├─system-ceph\x2dmon.slice │ └─ceph-mon@ceph01.service │ └─3219 /usr/bin/ceph-mon -f --cluster ceph --id ceph01 --setuser ceph --setgroup ceph
B.查看 ceph 存储空间
# ceph dfGLOBAL: SIZE AVAIL RAW USED %RAW USED 60 GiB 57 GiB 3.0 GiB 5.01 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS
C.查看 mon 相关信息
# ceph mon stat ##查看 mon 状态信息e1: 3 mons at {ceph01=192.168.100.116:6789/0,ceph02=192.168.100.117:6789/0,ceph03=192.168.100.118:6789/0}, election epoch 10, leader 0 ceph01, quorum 0,1,2 ceph01,ceph02,ceph03# ceph quorum_status ##查看 mon 的选举状态# ceph mon dump ##查看 mon 映射信息dumped monmap epoch 1epoch 1fsid afd8b41f-f9fa-41da-85dc-e68a1612eba9last_changed 2018-07-27 21:35:33.405099created 2018-07-27 21:35:33.4050990: 192.168.100.116:6789/0 mon.ceph011: 192.168.100.117:6789/0 mon.ceph022: 192.168.100.118:6789/0 mon.ceph03# ceph daemon mon.ceph01 mon_status ##查看 mon 详细状态
D.查看 osd 相关信息
# ceph osd stat ##查看 osd 运行状态3 osds: 3 up, 3 in# ceph osd dump ##查看 osd 映射信息# ceph osd perf ##查看数据延迟# ceph osd df ##详细列出集群每块磁盘的使用情况 ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.01949 1.00000 20 GiB 1.0 GiB 19 GiB 5.01 1.00 0 1 hdd 0.01949 1.00000 20 GiB 1.0 GiB 19 GiB 5.01 1.00 0 2 hdd 0.01949 1.00000 20 GiB 1.0 GiB 19 GiB 5.01 1.00 0 TOTAL 60 GiB 3.0 GiB 57 GiB 5.01 MIN/MAX VAR: 1.00/1.00 STDDEV: 0# ceph osd tree ##查看 osd 目录树# ceph osd getmaxosd ##查看最大 osd 的个数 max_osd = 3 in epoch 60
E.查看 PG 信息
# ceph pg dump ##查看 PG 组的映射信息# ceph pg stat ##查看 PG 状态0 pgs: ; 0 B data, 3.0 GiB used, 57 GiB / 60 GiB avail# ceph pg dump --format plain ##显示集群中的所有的 PG 统计,可用格式有纯文本plain(默认)和json
4.客户端挂载 ceph 文件系统
A.部署MDS(元数据服务器)在ceph01
# ceph-deploy mds create ceph01
B.创建pool
Ceph文件系统至少需要两个RADOS池,一个用于数据,一个用于元数据
# ceph osd pool create cephfs_data 128pool 'cephfs_data' created# ceph osd pool create cephfs_metadata 128pool 'cephfs_metadata' created
C.创建文件系统
# ceph fs new cephfs cephfs_metadata cephfs_datanew fs with metadata pool 3 and data pool 2
D.查看创建好的Ceph FS
# ceph fs lsname: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
E.查看 mds 状态
# ceph mds statcephfs-1/1/1 up {0=ceph01=up:active}
F.查看并加载内核模块
# lsmod | grep rbd# modprobe rbd# lsmod | grep rbdrbd 83889 0 libceph 282661 2 rbd,ceph
G.创建挂载点
# mkdir /mnt/mycephfs
H.在ceph集群下查看key
# ceph-deploy mds create ceph01# ceph auth get-key client.adminAQC9H1tbHjW8IhAA9B9tMP+9EEw7xbNQ857KVQ==
I.挂载启用了 cephx 认证的 Ceph 文件系统
# mount -t ceph 192.168.100.116:6789:/ /mnt/mycephfs -o name=admin,secret=AQC9H1tbHjW8IhAA9B9tMP+9EEw7xbNQ857KVQ==# df -h|grep ceph 192.168.100.116:6789:/ 60G 3.5G 57G 6% /mnt/mycephfs
J.有多个mon节点,可以挂载多个节点,保证CephFS的高可用
# mount -t ceph 192.168.100.116:6789,192.168.100.117:6789,192.168.100.118:6789:/ /mnt/mycephfs -o name=admin,secret=AQC9H1tbHjW8IhAA9B9tMP+9EEw7xbNQ857KVQ==
K.安全起见,将key复制至挂载点下再进行挂载
# ceph auth get-key client.admin > /etc/ceph/admin.secret
L.挂载 ceph
# yum install -y ceph ceph-common# ls /usr/sbin/mount.ceph/usr/sbin/mount.ceph# scp 192.168.100.116:/etc/ceph/admin.secret /root/admin.secret# mount -t ceph 192.168.100.116:6789:/ /mnt/mycephfs -o name=admin,secretfile=/root/admin.secret
M.将多个节点一起挂载,保证高可用
# mount -t ceph 192.168.100.116:6789,192.168.100.117:6789,192.168.100.118:6789:/ /mnt/mycephfs -o name=admin,secretfile=/root/admin.secret
N.卸载 ceph 文件系统
# umount /mnt/mycephfs
详见:
5.启用dashboard
使用如下命令即可启用dashboard模块:
# ceph mgr module enable dashboard
默认情况下,仪表板的所有HTTP连接均使用SSL/TLS进行保护。
要快速启动并运行仪表板,可以使用以下内置命令生成并安装自签名证书:
# ceph dashboard create-self-signed-certSelf-signed certificate created
创建具有管理员角色的用户:
# ceph dashboard set-login-credentials admin adminUsername and password updated
查看ceph-mgr服务:
默认下,仪表板的守护程序(即当前活动的管理器)将绑定到TCP端口8443或8080
# ceph mgr services{ "dashboard": "https://ceph01:8080/", }
浏览器输入https://ceph01:8080输入用户名admin,密码admin登录即可查看
查看集群状态:
6.启用Prometheus模块
启用Prometheus监控模块:
# ceph mgr module enable prometheus# ss -tlnp|grep 9283LISTEN 0 5 :::9283 :::* users:(("ceph-mgr",pid=3715,fd=70))# ceph mgr services{ "dashboard": "https://ceph01:8080/", "prometheus": "http://ceph01:9283/"}
安装Prometheus:
# tar -zxvf prometheus-*.tar.gz # cd prometheus-*# cp prometheus promtool /usr/local/bin/# prometheus --versionprometheus, version 2.3.2 (branch: HEAD, revision: 71af5e29e815795e9dd14742ee7725682fa14b7b) build user: root@5258e0bd9cc1 build date: 20180712-14:02:52 go version: go1.10.3# mkdir /etc/prometheus && mkdir /var/lib/prometheus # vim /usr/lib/systemd/system/prometheus.service ###配置启动项[Unit]Description=PrometheusDocumentation=https://prometheus.io[Service]Type=simpleWorkingDirectory=/var/lib/prometheusEnvironmentFile=-/etc/prometheus/prometheus.ymlExecStart=/usr/local/bin/prometheus \ --config.file /etc/prometheus/prometheus.yml \ --storage.tsdb.path /var/lib/prometheus/[Install]WantedBy=multi-user.target# vim /etc/prometheus/prometheus.yml ##配置配置文件global: scrape_interval: 15s evaluation_interval: 15sscrape_configs: - job_name: 'prometheus' static_configs: - targets: ['192.168.100.116:9090'] - job_name: 'ceph' static_configs: - targets: - 192.168.100.116:9283 - 192.168.100.117:9283 - 192.168.100.118:9283# systemctl daemon-reload# systemctl start prometheus# systemctl status prometheus
安装grafana:
# wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2-1.x86_64.rpm # yum -y localinstall grafana-5.2.2-1.x86_64.rpm # systemctl start grafana-server# systemctl status grafana-server
输入默认帐号密码admin登录之后会提示更改默认密码,更改后即可进入,之后添加数据源:
配置数据源名称、类型以及其URL
然后点击最下边的“Save&Test”,提示“Data source is working”即是成功连接数据源
从grafana官网上下载相应仪表盘文件并导入到grafana
最后呈现效果如下: