Skip to content

BibMartin/datasetter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataSetter

DataSetter helps you to create datasets with python, and serve them in a HTTP API.

A dataset is a product based on a homogeneous set of data. It is not only a table of data, but also it's associated metadata :

  • Name : A unique product name that identifies the product.
  • Description : A short explaination of the product : what it contains, how it has been collected, what is can (or cannot) be used for...
  • Columns : A description of the product 's components, with column names, types and description.
  • Facets : The list of the columns that can be used to filter and/or aggregate data. In large datasets, it can ba a strict subset of the columns.

For example, you can define a dataset based on a pandas.DataFrame object :

>>> import pandas as pd
>>> from datasetter.pandas_dataset import PandasDataset
>>>
>>> dataframe = pd.DataFrame([
>>>     ['A', 'alpha', 1],
>>>     ['A', 'beta', 13],
>>>     ['A', 'gamma', 8],
>>>     ['B', 'alpha', 1],
>>>     ['B', 'beta', 31],
>>>     ['C', 'gamma', 9],
>>>     ['C', 'alpha', 2],
>>>     ['D', 'beta', 21],
>>>     ['D', 'gamma', 0],
>>>     ], columns=['letter', 'greek', 'number'])
>>>
>>> dataset = PandasDataset(
>>>     dataframe,
>>>     name="Random letters",
>>>     description="A simple dataset with letters, greek letters and integers.",
>>>     columns=[
>>>         {"name": "letter", "type": "string", "description": "A column with letters."},
>>>         {"name": "greek", "type": "string", "description": "A column with greek letters."},
>>>         {"name": "number", "type": "integer", "description": "A column with numbers."},
>>>         ],
>>>     facets=['letter', 'greek'])

Then, access it's methods in a standard way :

>>> dataset.count()
9

>>> dataset.count(letter="A")
3

>>> dataset.sample(2, greek="gamma")
  letter  greek  number
2      A  gamma       8
5      C  gamma       9

>>> dataset.count_by('greek')
alpha    3
beta     3
gamma    3
Name: greek, dtype: int64

>>> dataset.metadata()
{'name': 'Random letters',
 'description': 'A simple dataset with letters, greek letters and integers.',
 'columns': [{'name': 'letter',
   'type': 'string',
   'description': 'A column with letters.'},
  {'name': 'greek',
   'type': 'string',
   'description': 'A column with greek letters.'},
  {'name': 'number',
   'type': 'integer',
   'description': 'A column with numbers.'}],
 'facets': ['letter', 'greek']}

About

Create & share your datasets with python.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages