Amazon S3 and Cleversafe are object storage vendors. Here we are going to focus on Cleversafe.
First of all, what is object storage?
Object storage means storing files with a unique identifier instead us using a hierarchical directory structure name like \\mountpoint\\a\b\c\d\file.txt.
The object model scales better than the hierarchical model and, Correlsense says, it scales better than Apache Hadoop.
Apache Hadoop is a hierarchical file structure that spreads files across multiple nodes. Correlsense also spreads file across multiple nodes. They say their model can scale almost without limit. One of their customers is the special affects studio that worked on the film Gravity. Object storage is ideal for dealing with images like those used to make movies or for anyone who works with video editing or streaming. For example, Netflix uses object storage (Amazon S3) to store their massive collection of video files.
Another reason that object storage can be better than hierarchical storage (I say “can be,” because it is not appropriate for all applications, like database files, such as an Oracle table space.) is you can change the metadata, which is data about data. With directory storage you are limited to:
- File name
- Number of bytes
- Date created
- Date modified
- Directory location
- Name of the person who created the file
But with objects you can change the metadata. For example, you could have:
- Customer name
- Storage retention days
- Codec encoding scheme
Cleversafe, like Amazon, use REST web services. These use web services to add, delete, and move a file from one bucket to another. There is no update function. This is not a database. All object storage vendors use REST, as it is part of its definition.
Openstack Swift is the name of the open source project that distributes software for object storage. Storage vendors use this, or some variation of this, when writing their storage software.
Cleversafe’s approach to storage is different than other cloud vendors, because they sell their own appliances instead of renting you space of theirs. The Cleversafe Accesser device is for retrieving and writing data; it works like an HTTP web server, since REST is based on HTTP. The Slicestor is used to divide up the data into piece they call “slices” using the Dispersal Storage Algorithm and then store it to disk.
The Cleversafe algorithm provides whatever degree of redundancy (called a “threshold”) the customer wants. The company says this provides data integrity and redundancy, plus it uses much less storage than Hadoop. Hadoop writes three copies of every data block that it writes.
The Cleversafe dispersal algorithm borrows ideas from the fields of television and radio to provide redundancy and error checking. A radio cannot go back to the station to ask them to retransmit data that is garbled in transmission by, for example, electrical interference from lightning. It’s a one way conversation. The same is true for TV. So what they do instead is send each signal more than one time. For example, if the TV pixel or radio signal is, for example, “1,” then the radio might send “111” as in send three copies. If what is received is 110, then the assumption is “1” was probably what they sent, because there are more 1s than 0s.
Cleversafe uses a move efficient algorithm called Reed-Solomon. The Slicestor uses this and their own dispersal algorithm to divide of a block of characters into bits and then store them across different Slicestor appliances. They call each subset of the data a “slice.” Since there is already duplication in the data, because of Reed-Solomon, and since there is some duplication of the slices, this allows the system to recover in the event of the failure of n number of devices. The customer choses the value n, to provide the degree of protection that they want.
Cleversafe is an example of private cloud storage as opposed to public (like Amazon). This means each storage device is dedicated to one customer and not shared as would be the case with virtualized servers, like in the case of Amazon Elastic Cloud (EC).
Cleversafe is growing rapidly in the market for cloud storage. Because it uses less storage than Hadoop, and because the custom buys appliances rather than rent space, it could save customers money over other storage solutions. In a future post I will discuss how Amazon S3 creates redundancy, so that we can compare it to Cleversafe to see which vendor uses less space. Although this would not matter for Amazon, because their customers pay, based on the amount of data stored and not the storage devices required to store it.