Ceph

Allikas: Kuutõrvaja
Redaktsioon seisuga 27. august 2014, kell 00:30 kasutajalt Jj (arutelu | kaastöö)

Ceph koosneb järgnevatest komponentidest

RADOS: Reliable Autonomic Distributed Object Store is an object storage. RADOS takes care of distributing the objects across the whole storage cluster and replicating them for fault tolerance. It is built with 3 major components:

  1. Object Storage Daemon (OSD): the storage daemon - RADOS service, the location of your data. You must have this daemon running on each server of your cluster. For each OSD you can have an associated hard drive disks. For performance purpose it’s usually better to pool your hard drive disk with raid arrays, LVM or btrfs pooling. With that, for one server your will have one daemon running. By default, three pools are created: data, metadata and RBD.
  2. Meta-Data Server (MDS): this is where the metadata are stored. MDSs build POSIX file system on top of objects for Ceph clients. However if you are not using the Ceph File System, you do not need a meta data server.
  3. Monitor (MON): this lightweight daemon handles all the communications with the external applications and the clients. It also provides a consensus for distributed decision making in a Ceph/RADOS cluster. For instance when you mount a Ceph shared on a client you point to the address of a MON server. It checks the state and the consistency of the data. In an ideal setup you will at least run 3 ceph-mon daemons on separate servers. Quorum decisions and calculs are elected by a majority vote, we expressly need odd number.

Ilma raidita on vaja paigaldada 1 OSD deemon iga füüsilise ketta kohta.

Transporditeenused.

  1. Ceph RGW ( Object Gateway / Rados Gateway ) API liides, ühilduv Amazon S3 ja openstacki swiftiga.
  2. Ceph RBD ( Raw Block Device ) Blokkseadmed virtuaalmasinatele, sisa omadaldab snapshottimist, provisioneerimist ja pakkimist.
  3. CephFS ( File System ) hajus POSIX NAS storage. Mountimine käib üle fuse.

Produktsioon süsteemi jaoks on soovitatav kasutada viite füüsilist või virtuaalset serverit. üks server andmetega (OSD), üks server metadata jaoks (MDS) ja kaks server-monitori ja admin server (esimene klient)

RGW ja RBD puhul on vaja ainult OSD ja MON deemoneid, kuna metadata (MDS) on vajalik vait CephFS jaoks (mitme metadata serveri kasutamine on alles arendamisel ja eksperimentaalne)

Radosgw is an HTTP REST gateway for the RADOS object store, a part of the Ceph distributed storage system. It is implemented as a FastCGI module using libfcgi, and can be used in conjunction with any FastCGI capable web server.

Iga salvestamiseks oleva terabaidi kohta soovitatakse OSDl omada üks gigabait mälu (läheb peamiselt vaja taastamisel jms operatsioonidel). Metadata serveritel 1G iga deemoni instance kohta. Most “slow OSD” issues arise due to running an operating system, multiple OSDs, and/or multiple journals on the same drive. Ceph must write to the journal before it can ACK the write. The btrfs filesystem can write journal data and object data simultaneously, whereas XFS and ext4 cannot. Igaljuhul OSD andmed ja OSD journal tasub kindlasti panna erinevatele ketastele.

Ceph OSDs run the RADOS service, calculate data placement with CRUSH, selleks vajab OSD vähemalt 4 tuuma.

Ceph-topo.jpg

Lingid

Testclustri ehitamine http://ceph.com/docs/master/start/quick-ceph-deploy/

Veel ühe clustri ehitamine http://www.server-world.info/en/note?os=CentOS_6&p=ceph

Jõudlustestid https://software.intel.com/en-us/blogs/2013/10/25/measure-ceph-rbd-performance-in-a-quantitative-way-part-i

Ilusad skeemid aga võõras keeles http://wiki.zionetrix.net/informatique:systeme:ha:ceph

Veel üks võimalik kombinatsioon http://www.openclouddesign.org/articles/vdi-storage/ceph-highly-scalable-open-source-distributed-file-system

Mõned testid http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/

Performanci nõuanded http://www.slideshare.net/Inktank_Ceph/ceph-performance

Veel üks asjalik ehitusõpetus http://www.sebastien-han.fr/blog/2012/06/10/introducing-ceph-to-openstack/

Testid raidikaartidega http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/