Assets

About

Assets is the collective term used for anything that appears in a project. It is vague on purpose, because StatWrap tries to be accommodating of any type of 'thing' that belongs to your project. This more traditionally includes files and folders, but can also be URLs, databases, web services, etc.

The only things assets need to have are a uri and a type.

The structure of an asset is as follows:

Item	Type	Description
`uri`	String	The location of the asset. This must be unique within the project.
`name`	String	Optional display name. If no name is provided, this will default to the `uri`
`type`	String	The type of the asset. This is a more basic list that captures how the asset is interacted with, not necessarily what it contains. Examples include directory, file, URL, etc.
`contentTypes`	Array	Describe the type of content within the asset. This is more about what the asset does, not where it's stored. Is an array as the type of content can be considered different for the same file extension (e.g. HTML is both documentation and code)
`attributes`	Array	A collection of objects containing additional attributes about the asset. The contents and composition of attributes is driven by the asset type

This structure applies to the asset when it's in memory, and when it is saved in configuration/metadata files. An important consideration is that for assets to be resolvable across multiple users, we cannot store the uri as the absolute path, and when it is in memory it is inefficient to have it only be relative (requiring us to make it absolute). To work around this, we will adopt the following convention:

READING - whenever a class reads asset information, and it contains file/directory assets, the class is responsible for translating the relative path to an absolute paths.

WRITING - whenever a class is writing asset information, and it contains file/directory assets, the class is responsible for translating the absolute path to a relative path.

This will centralize the responsibility of the asset URI translation, and allows all other classes to assume that a URI is truly a fully-qualified path. From a code standpoint, the actual reading/writing should only be done within the main.dev.js implementation. Supporting classes used within here should clearly document if they are responsible for modifying paths. If not, the class assumes it will be given absolute paths (if it uses paths for any of its work).

Asset Attributes

The master list of attributes is loaded from app/constants/assets-config.js. This returns the entire configuration object for assets, and the attributes element of this object has the attribute configuration.

Each attribute entry is configured as follows:

Item	Type	Description
`id`	String	Unique human-readable identifier for the attribute. This should be unique across all attributes within StatWrap.
`display`	String	The display label to use when rendering the attribute
`details`	String	Extended descriptive information about how to interpret the attribute
`type`	String	The type of attribute. This can include: - bool - a boolean/checkbox - text - a short string
`appliesTo`	Array	The asset types that use this attribute

Discovering Assets

Asset discovery is going to be more than just files and folders, but that is where we're going to start initially.
For our initial work, we will start at the root of the project's file folder and recursively scan every file and folder underneath it. We want to make sure we're not being overly intrusive on anything the user has in their folders, so scanning needs to be a 'light touch'.

AssetService

The service in charge of asset discovery can be found in app/services/assets/assets.js (AssetService). It will have registered within it all of the specialized classes that is able to process each type of asset (called 'handlers'). Each one of these handlers can live in type-specific subdirectories under app/services/assets/handlers.

The main entrypoint for AssetService is via scan(), which takes the root project folder as its only parameter. This kicks off a series of subsequent processing steps to get more information about each of the assets.

The AssetService is responsible for first navigating the list of all assets in a given project. As noted above, this will start with just files and folders, but can extend to other types of assets (e.g., URLs, databases) in the future. From a design standpoint, it seems to make sense to have this class be the one responsible for identifying all assets. A potential downside we will need to watch out for is if there's a lot of duplicate code in this class and in other asset handler classes.

The list of assets that are built internally will be of this general structure:

{
  id: 'uuid',  - Assigned by StatWrap when added/indexed within the project
  uri: 'the provided uri to scan',
  type: 'file | directory | socket | symlink | ... others TBD ... | other | unknown',
  metadata: Array,
  children: Array (optional) - only if type is 'directory'
}

Because we will only ever send in a directory as the URI to scan, we always return an object instead of an array. Nested assets will be contained within the children attribute (if applicable), and can be recursively navigated.

Once this structure of assets is built, it is sent to our collection of handlers for additional annotation.

NOTE: We don't assume that only one handler applies to any asset. This will allow us to be a little more flexible in how we define and implement the asset handlers, while realizing that we will need to avoid signficant extra processing overhead.

Every handler should implement the following interface:

Function	Parameters	Return	Description
`scan`	`asset` (Object)	Object	Performs the main work of the asset handler in identifying relevant metadata and other information for a root asset specified by `asset`. It is assumed that each handler will recursively process all descendant assets. Each handler will add an object to the `metadata` collection, if it handles the specified asset. This is expected to include an `id` attribute that is the value of `id()`.
`id`	(None)	string	Returns a descriptive identifier for the handler, used to track which handler produced specific results.

Asset Metadata

Multiple elements can be looked at, which can vary by asset type. For example, what we look at for a folder would differ from a Python code file, and those would differ from a web service/API or a database. Understanding then that metadata is going to be both flowing and growing for each type of asset, here are some of the things we are looking for.

Python Code Files

We collect 3 categories of metadata for Python code files:

Inputs

Category	Example Functions	Data Type
Figures/images	`imread`	`figure`
Pandas	`read_*`	`data`
Python I/O	`open`	`data`

Outputs

Category	Example Functions	Data Type
Figures/images - PyPlot, ImageMagick	`savefig`, `imwrite`	`figure`
Pandas	`to_*`	`data`
Python I/O	`open`

Libraries

Packages and modules, including aliases

R Code Files

We collect 3 categories of metadata for R code files:

Inputs

Category	Example Functions	Data Type
R data import	`read.*`	`data`
readr	`read_*`	`data`
R connections	`file`, `bzfile`	`data`
Source	`source`	`code`

Outputs

Category	Example Functions	Data Type
R plots	`pdf`, `png`	`figure`
ggplot	`ggsave`	`figure`
R data export	`write.*`	`data`
readr	`write_*`	`data`
R connections	`file`, `bzfile`	`data`

Libraries

Regular library includes

SAS Code Files

We collect 3 categories of metadata for SAS code files:

Inputs

Category	Example Functions	Data Type
SAS PROC	`PROC IMPORT`	`data`
SAS infile	`infile`	`data`

Outputs

Category	Example Functions	Data Type
ODS figures	`ODS PDF`, `ODS PS`	`figure`
ODS data	`ODS CSV`, `ODS HTML`	`data`
SAS PROC	`PROC EXPORT`	`data`

Libraries

References to macros and libraries via path or via fileref.

Stata Code Files

We collect 3 categories of metadata for Stata code files:

Inputs

Category	Example Functions	Data Type
Stata imports	`import excel`, `infile`	`data`

Outputs

Category	Example Functions	Data Type
Stata graph	`graph export`	`figure`
Stata logs	`log using`	`log`
Stata export	`export excel`, `outfile`	`data`
Document export	`putdocx`, `putpdf`	`data`
estout package	`estout using`	`data`
table1 package	`table1 saving`	`data`

Libraries

External programs and plugins, and references to Do files to run via another script.

Asset Groups

By default StatWrap mimics the traditional hierarchical file system view. However, we realize that not all assets will be within a single file system (or may not even be files / folders). Also, we want to allow users to establish other groups of assets that make sense to them. Asset Groups will be a way for users to do this, and StatWrap will store these within the Project metadata.

For a project, the asset groups will be stored as part of an array within the assetGroups attribute, and each object will have the following structure:

Item	Type	Description
`id`	`uuid`	A generated unique identifier for the asset group within the project
`name`	`string`	The display name of the asset group. Within the user interface we will require the user to make this unique to avoid confusion.
`details`	`string`	User-provided details and description of what the asset group contains/represents
`assets`	`array`	Array of asset URIs that are part of the group. Assets that are related to file system objects will be converted to relative paths for portability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assets.md

Assets.md

Assets

About

Asset Attributes

Discovering Assets

AssetService

Asset Metadata

Python Code Files

R Code Files

SAS Code Files

Stata Code Files

Asset Groups

Files

Assets.md

Latest commit

History

Assets.md

File metadata and controls

Assets

About

Asset Attributes

Discovering Assets

AssetService

Asset Metadata

Python Code Files

R Code Files

SAS Code Files

Stata Code Files

Asset Groups