Skip to content

Tensor Storage

amitsingh19975 edited this page Jul 20, 2019 · 4 revisions

Introduction

A data structure is a particular way of organizing data in a computer so that it can be used effectively and stored efficiently. Tensor Storage predefined data structure for storing data in a efficient way but there is a trade off for storing data efficiently and time to access data. As complexity for storing data increases, accessing time increases as both of them directly proportional.

There are three types of storage :-

  1. Dense Storage
  2. Band Storage
  3. Sparse Storage

Dense Storage

It ​store values in a contiguous sequential block of memory where all values are represented because of which they heavy on memory side. If we have large amount of non-zeroes or non-zero elements are much greater than number of zero elements, it is prefered to store in dense as there is no gain in using other storage type. As they are contiguous memory which can be cached and making operations faster than other containers.

Band Storage

It is a ​sparse ​whose non-zero entries are confined to a diagonal ​band,​ comprising the ​main diagonal and zero or more diagonals on either side. It is also stored similarly as sparse tensor.

Sparse Storage

It is a large data which has large amount of zeroes or zero elements are much greater than number of non-zero elements, it's faster to perform computation by iterating through the non-zero elements. They are compressed using various algorithms or data structure such as CSR, maps, etc.

Current Implementation

  1. tensor_storage

It is an empty base class which is used to know if it is a tensor storage or not.

  1. dense_storage

It is an empty class which is used to know if it is a dense tensor storage or not, which inherits from tensor_storage.

  1. sparse_storage

It is an empty class which is used to know if it is a sparse tensor storage or not, which inherits from tensor_storage.

  1. band_storage

It is an empty class which is used to know if it is a band tensor storage or not, which inherits from tensor_storage.

compressed_map

compressed_map is a sparse storage which uses std::unordered_map<const size_t, T> to store data. It stores data by mapping stride or index of data element with data element but it will ignore zeroes and only stores non-zero elements.

Prototype

template <typename T, typename A = std::allocator<std::pair<const ptrdiff_t, T>>>
struct compressed_map;

Example

lets say A = [1,1,0,0,0,0,0,0,0,2,3,4] // container size = 11
Now with compressed_map the storage looks like
C = [(0,1),(1,1),(9,2),(10,3),(11,4)] => ( idx, el ) // container size = 5
auto t1 = tensor<float,dynamic_extents<>,first_order, compressed_map<float> >(dynamic_extents<>{1,2,3,4},0.f);
auto t2 = tensor<float,dynamic_extents<>,first_order, compressed_map<float> >(dynamic_extents<>{1,2,3,4},4.f);
auto t3 = tensor<float,dynamic_extents<>,first_order, compressed_map<float> >(dynamic_extents<>{1,2,3,4});

t2[0] = 0;
t2[1] = 0;

t2.prune(); // prunes zeroes from the `std::unordered_map` after modifying the data

t3[0] = 1;

Conclusion from above code

  1. t1 has the container size of 0
  2. t2 had the container size of 24 but after replacing elements to zero and calling prune has the container size of 22
  3. t3 previously had the container size of 0 but after replacing to 1 it has the container size of 1

Problems with compressed_map and Sparse Storage

std::unordered_map does not contain data member function which returns the pointer to the memory where it data is stored but it's not the with only this implementation but with Sparse Storage and Band Storage as they are in compressed form and can have noncontiguous memory which undermines or limitation of current tensor implementation. tensor heavily relies on pointer to the contiguous memory and uncompressed form of data for prod functions, algorithms, etc. There is also decrease in performance of tensor contractions or tensor operation which is a trade off between access time and memory.

default_storage (Dense Storage)

It is used to decide to which dense structure to use between std::array and std::vector structure according to the extents type for example static extents or dynamic extents. tensor has default storage type default_storage but if you other storage or user-defined storage you can simply pass your storage inheriting from three types dense_storage, sparse_storage or band_storage for better support.

Prototype

template <typename T, typename E, typename A, typename = void>
struct default_storage;

Custom Storage Types

  1. First you have to decide in which category you storage belongs dense_storage, sparse_storage or band_storage.
  2. Inherit the storage type for better support specially sparse and band storage
  3. Implement your storage containing interface similar to std::vector
  4. Just pass the struct or class to the tensor

Example

using namespace boost::numeric::ublas;
template<typename T>
struct custom_storage : storage::sparse_storage{/* Implementation */}

auto t = tensor< float, dynamic_extents<>, first_order, custom_storage<float> >{1,2,3,4};

Missing

To mitigate problems which is mentioned above in compressed_map and sparse storage we yet to come up with a interface similar to std::iterator which allow the access of data elements as sparse storage are compressed form of data which cannot be iterated directly by just getting pointer which current tensor exploits to do tensor operations but for sparse storage we have to employ some other interface which itself is a huge and time consuming and cannot be completed in GSoc period because we all have to agree and design a new interface for sparse storage or band storage which is time consuming and a huge task itself.

Clone this wiki locally