Skip to content

HDF5 smFRET format Draft 0.2 (WIP)

talaurence edited this page Aug 9, 2014 · 47 revisions

This documents describes the "Draft 0.2" of the HDF5-smFRET file format.

NOTE: This is currently a work-in-progress. Comments and suggestions are encouraged from all the interested parties.

Some points open for discussion

  • Is there any important missing information? I believe we can make this file format more general without sacrificing usability. Rather than "smFRET" file format, I would envision it as something more generic as a "Photon List". This photon list can be visualized as a 2D table where each photon is a row, and each piece of information is a column. For a photon list, there is automatically a "Photon Time" column. Additional columns such as "channel" and "nanotime" can be defined at will. Additional "virtual columns" are defined as calculations on existing columns. For example the laser alternations in "ALEX" methods can be simply calculated from the "Photon Time" column. For confocal scans, the x, y, and z positions can likewise be calculated from the "Photon Time". I think using this format, we can handle a large variety of photon data.
  • Discuss which fields are optional and which are mandatory (sample and instrumentation info?)
  • Discuss general layout: which groups (i.e. "folder") to create inside the file
  • Discuss naming conventions (use short or long names?)

Metadata

An attribute of the root node must include a string with the name of the format (i.e. 'HDF5-smFRET' or something more descriptive) and a version of the format to preserve backward and forward compatibility.

##Data fields

General fields:

  • timestamps_unit: (float) time in seconds of 1-unit increment in timestamps. Normally, timestamps are integers and the unit increment is determined by the acquisition electronics. However, timestamps can also be floats and express the time in seconds. In this case timestamps_unit will set to 1.
  • number_confocal_spots: (integer) Normally its 1 for single-spot measurements. In multi-spot measurements contains the number of excitation spots.
  • ALEX: (boolean) tells if the measurements used excitation alternation.
  • alternation_period (integer or float): the duration of the excitation alternation using the same units as the timestamps. The alternation period in seconds is obtained by multiplying alternation_period by timestamps_unit. This field is present only for ALEX data.

Timestamps and detector: basic layout

Timestamps and corresponding detectors:

  • timestamps_t: (array int or float) contains all the recorded timestamps
  • detectors_t: (array of integers) contains the detector number for each timestamp in timestamps_t. Each physical detector (for example donor and acceptor channels) needs to have a unique label (a positive integer including zero). For example, measurements of smFRET and polarization anisotropy with a single donor-acceptor pair have 4 detectors, and it needs 4 different labels.

Donor/acceptor detector information:

  • detectors_donor: (array of ints) list of detectors for the donor channel. A standard smFRET measurement will have only one value. A smFRET with polarization (4 detectors) will have 2 values. For a multi-spot measurement it will contain the list of donor-channel detectors. The order matters.
  • detectors_acceptor: (array of ints) list of detectors for the acceptor channel. A standard smFRET measurement will have only one value. A smFRET with polarization (4 detectors) will have 2 values. For a multi-spot measurement it will contain the list of acceptor-channel detectors. The order matters.

NOTE: If only a single spectral channels is acquired the detector(s) can be put in either detectors_donor or detectors_acceptor but not in both.

Polarization information:

  • detectors_parallel_polarization (array of ints) list of detectors for the parallel polarization.
  • detectors_perpendicular_polarization (array of ints) list of detectors for the perpendicular polarization.

NOTE: If no polarization information is acquired these fields should be empty.

Timestamps and detector: multi-spot layout

In multi-spot measurements the basic layout can be used. However to reduce RAM requirements and speed-up the reading time it is convenient to store timestamps in different arrays, one for each spot.

In this case we have a group /timestamps that contains a series of arrays:

  • ts_0, ts_1, ... ts_N (where N is the number of spots)

Each array contains all the timestamps (donor + acceptor) for the given spot.

The information about acquisition channel (i.e. donor or acceptor) for each timestamp is stored in a boolean mask for the acceptor channel (a timestamp is from the acceptor channel if the boolean is True). These boolean masks are a series of arrays in the group /acceptor_mask:

  • A_mask_0, A_mask_1, ... A_mask_N (where N is the number of spots)

Like for the /timestamps group there is one array per excitation spot. Each array in /acceptor_mask is the boolean mask for the corresponding array in /timestamps (for A_mask_0 -> ts_0, etc...).

When using the "multi-spot layout" (that can be used in principle also for single-spot data) the following fields specific of the "basic layout" should not be present:

  • timestamps_t
  • detectors_t
  • detectors_donor
  • detectors_acceptor
  • detectors_parallel_polarization
  • detectors_perpendicular_polarization

TCSPC-specific fields (optional)

Used in conjunction with the "basic layout":

  • nanotime: TCSPC photon arrival time (nanotime)
  • nanotime_params (group): TCSPC hardware and lifetime data parameters

Simulation-specific fields (optional)

When data comes from simulation and it is known which particle emitted each timestamp this information can be saved in the following field.

Used in conjunction with the "basic layout":

  • particles: particle label (number) for each timestamp.

Sample fields

The group /sample_parameters contains the following fields describing the sample:

  • number_of_dyes: (int) number of different dyes present in the samples. For a standard single-pair FRET measurement the value is 2. For donor-only or acceptor-only measurements the value should be 1. Values larger than 2 are allowed but not currently covered in this document.
  • donor_dye (string) name of the donor dye, or empty string if no donor dye is present.
  • acceptor_dye (string) name of the acceptor dye, or empty string if no acceptor dye is present.
  • buffer (string) free-form description of the sample buffer. For example 'TE50 + 1mM of TROLOX'.

Measurement setup fields

The group /setup_parameters contains the following fields describing the measurement setup:

  • excitation_wavelength_donor: (float) excitation wavelength in S.I. units (meters) for the donor dye.
  • excitation_wavelength_acceptor: (float) excitation wavelength in S.I. units (meters) for the acceptor dye.

Optional fields (they may not exist):

  • excitation_power_donor (float) excitation power in S.I. units (W) for the donor dye.

  • excitation_power_acceptor (float) excitation power in S.I. units (W) for the acceptor dye.

  • detector_type (table): first column is the integer containing with the detector label. The second column is a 128-char string with detector name. For example 'MPD red-enhanced gen. 1'.

Additional fields

Any additional user-defined fields should be allowed. To make sure we can in the future use new names without conflicting with user-defined fields all the custom data should be contained in a specific group, named for example user_data.

Clone this wiki locally