You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are working with Eli to develop a pipeline for uploading his XAS beam line data into aimmdb. Particularly, we want to accomplish the following with this issue:
Summary
Our endpoint (for now) will be a .dat file which contains comments starting with # (some of which are critical pieces of metadata), and otherwise columnated data which are space-delimited.
Using the existing xas schema, each channel (column other than the energy, basically) will be read into aimmdb, where the energy column is self-explanatory, and the mu column will be any of the many channels. The channel which is chosen will be indicated in the metadata as measurement_type. Eli's code below provides a good starting point. It's somewhat pseudocode and some work needs to be done.
importnumpyasnpimportpandasaspdMEASUREMENT_INSTRUCTIONS= {
"transmission": {
"name": "transmission",
"numerator": "it",
"denominator": "i0",
"log": True,
"invert": True,
"col_name": "mu_trans",
},
"fluorescence": {
"name": "fluorescence",
"numerator": "iff",
"denominator": "i0",
"log": False,
"invert": False,
"col_name": "mu_fluo",
},
}
defextract_mu(path, measurement_kind):
df=pd.read_csv(path)
measurement_description=MEASUREMENT_INSTRUCTIONS[measurement_kind]
energy=df["energy"]
mu= (
df[measurement_description["numerator"]]
/df[measurement_description["denominator"]]
)
ifmeasurement_description["log"]:
mu=np.log10(mu)
ifmeasurement_description["invert"]:
mu=-mu# Also read the metadata from the file, include all commented lines, but# we need to pick out the particularly important databroker unique idmetadata= ...
# process data frame...returndf, metadata
Specific steps
Create a module aimmdb.ingest
Create a particular file aimmdb/ingest/eli.py (we'll rename this to the name of Eli's beam line later)
Create a single function (ingest) which takes a single path as an argument and returns a pd.DataFrame (the data) and dict (metadata).
Don't forget that the pd.DataFrame columns must be energy and mu. The actual column we use for mu will change depending on the channel we're looking at.
In Eli's examples, we only have "transmission" and "fluorescence". Eli has provided instructions (code above) on how to process these particular types of data and how they should be represented in aimmdb
We MUST document every type of processing we do (see above code) before it gets uploaded into aimmdb. I recommend a README file in aimmdb.ingest for now, until we move to a more standard documentation solution.
The text was updated successfully, but these errors were encountered:
Building the ingestion pipeline
We are working with Eli to develop a pipeline for uploading his XAS beam line data into
aimmdb
. Particularly, we want to accomplish the following with this issue:Summary
.dat
file which contains comments starting with#
(some of which are critical pieces of metadata), and otherwise columnated data which are space-delimited.xas
schema, each channel (column other than the energy, basically) will be read intoaimmdb
, where the energy column is self-explanatory, and themu
column will be any of the many channels. The channel which is chosen will be indicated in the metadata asmeasurement_type
. Eli's code below provides a good starting point. It's somewhat pseudocode and some work needs to be done.Specific steps
aimmdb.ingest
aimmdb/ingest/eli.py
(we'll rename this to the name of Eli's beam line later)ingest
) which takes a single path as an argument and returns apd.DataFrame
(the data) anddict
(metadata).pd.DataFrame
columns must beenergy
andmu
. The actual column we use formu
will change depending on the channel we're looking at.aimmdb
aimmdb
. I recommend aREADME
file inaimmdb.ingest
for now, until we move to a more standard documentation solution.The text was updated successfully, but these errors were encountered: