Write out simulation truths to jsonl files #70

maxnoe · 2018-02-15T14:23:15Z

Important simulation truths is missing in the output

energy
source position
pointing position

at least these 3 are needed to use the files for training of an energy estimator or source reconstruction algorithm

relleums · 2018-02-15T15:44:00Z

The simulation truth of CORSIKA is in separate files which have to contain at least the headers of the CORSIKA output. This was done on purpose to avoid mental-mapping and not remove any simulation-truth only it is not used at the moment.
There is an example in the readme.

import photon_stream as ps
import pandas as pd

sim_reader = ps.SimulationReader(
    photon_stream_path='tests/resources/011014.phs.jsonl.gz',
    mmcs_corsika_path='tests/resources/011014.ch'
)

for event in sim_reader:
    # process event ...
    # extract Hillas and other features ....
    # do deep learning ...
    pass

thrown_events = pd.DataFrame(sim_reader.thrown_events())

There is a reader which merges this information:

reader = photon_stream.SimulationReader(photon_stream_path, mmcs_corsika_path)

maxnoe · 2018-02-15T15:45:35Z

I know this!

I think it is a huge advantage, to not have to give people the corsika files so they can do event reconstruction. That's the point.

This was done on purpose to avoid mental-mapping and not remove any simulation-truth only it is not used at the moment.

A little mental mapping for us is much less work than explaining every new bachelor student what Corsika is, why there are these strange other files are, why things are called phi and theta and not azimuth and zenith and why he cannot simply read the json lines.

relleums · 2018-02-15T16:11:12Z

The CORSIKA files are tiny when the photon-blocks are removed, as I did it for the simulation sample here https://ihp-pc41.ethz.ch/public/phs/sim/. So I do not see a problem to give the CORSIKA files to the users.
This way we avoid mental mapping and keep all the information from CORSIKA. The users can even reproduce the air-showers based on the run-header.
Is it because one needs to have multiple files do achieve one task? In this case tape-archive is your friend. Personally I prefer to map hierarchy in the file-system. If you have bad feelings because of the DRS-file mayhem in FACT, in this case I agree and they it is very bad. But here it is a very different quality of having two files as they have the same names and only different suffixes.

maxnoe · 2018-02-15T16:23:16Z

No, I just went through an hour explaining someone why these files exist, what theta and phi mean and why there are different coordinate systems, what one has to do to change it and many other things.

How is that easier than just providing 5 additional numbers in the jsonl????

relleums · 2018-02-15T16:41:01Z

It is not easier, and yes I know that it takes hours. But solving this is beyond the scope of the photon-stream. For a specific task, five numbers with key-names known by two or three people are fine. But in general we want to have the full simulation truth. We do not know what piece of the simulation truth might be relevant for the user. The user might know the CORSIKA manual. For our stuff, there does not even exist a manual.
However, I agree that additional functions which transform between the reference-frames on the fly make sense. These functions map our knowledge about the different reference-frames of CERES and CORSIKA. But these functions do not belong to the photon-stream I fear. This involves CERES a lot. This belongs to a level above CORSIKA, CERES and the photon-stream. For instance, I have simulated a lot of FACT events for my thesis recently with a different tool than CERES which did not introduce a new reference-frame but kept the reference-frame of CORSIKA.

kbruegge · 2018-02-15T16:45:41Z

This way we avoid mental mapping and keep all the information from CORSIKA. The users can even reproduce the air-showers based on the run-header.

I agree. Providing some method to perform the transformation might be helpful. Or even do it 'on-the-fly' as you mention.

relleums · 2018-02-15T16:47:47Z

see issue #71 for the discussion on the different reference-frames

maxnoe · 2018-02-15T19:36:00Z

I whole-heartedly disagree.

Having energy, pointing direction and source direction directly at hand and in a well defined coordinate system is such huge usability boost that we shouldn't say

It's in those other binary files, in another coordinate system. Here is some code to read it and to convert it.

That's insane.

relleums · 2018-02-18T17:28:55Z

Can we compromise that we agree to find ways to provide all pointing in one 'well defined' reference-frame, but that we will not put it into the 'phs' files? Can we decouple the reference-frame issue from the format-issue?

maxnoe changed the title ~~Write out simulation truth's to jsonl files~~ Write out simulation truths to jsonl files Feb 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write out simulation truths to jsonl files #70

Write out simulation truths to jsonl files #70

maxnoe commented Feb 15, 2018

relleums commented Feb 15, 2018

maxnoe commented Feb 15, 2018 •

edited

Loading

relleums commented Feb 15, 2018

maxnoe commented Feb 15, 2018

relleums commented Feb 15, 2018 •

edited by dneise

Loading

kbruegge commented Feb 15, 2018

relleums commented Feb 15, 2018

maxnoe commented Feb 15, 2018

relleums commented Feb 18, 2018

Write out simulation truths to jsonl files #70

Write out simulation truths to jsonl files #70

Comments

maxnoe commented Feb 15, 2018

relleums commented Feb 15, 2018

maxnoe commented Feb 15, 2018 • edited Loading

relleums commented Feb 15, 2018

maxnoe commented Feb 15, 2018

relleums commented Feb 15, 2018 • edited by dneise Loading

kbruegge commented Feb 15, 2018

relleums commented Feb 15, 2018

maxnoe commented Feb 15, 2018

relleums commented Feb 18, 2018

maxnoe commented Feb 15, 2018 •

edited

Loading

relleums commented Feb 15, 2018 •

edited by dneise

Loading