Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

Discussion: Software Routine for Determining Proper SBIT Timing Parameters #61

Open
1 of 2 tasks
bdorney opened this issue Oct 23, 2018 · 6 comments
Open
1 of 2 tasks

Comments

@bdorney
Copy link
Contributor

bdorney commented Oct 23, 2018

Brief summary of issue

So we have seen that we have an issue with the sbit mapping in V3 electronics. This issue persists when using GEBv3c+OHv3c hardware:

While the situation with complete v3c hardware is improved it is still not desired. Additionally this is just for the short detector and we will need a set of parameters also a long detector. Then for GE2/1 there will be 8 sets of parameters, and ME0 will contribute another set. So we need a software routine that can automatically determine the correct set of timing registers.

The registers of interest are:

  • GEM_AMC.OH.OHX.FPGA.TRIG.CTRL.SOT_INVERT
  • GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.SOT_TAP_DELAY_VFATY
  • GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.TAP_DELAY_VFATY_BITZ
  • GEM_AMC.OH.OHX.FPGA.TRIG.CTRL.VFATY_TU_INVERT

According to @andrewpeck

{X} is the Optohybrid number, which is determined by the CTP7 fiber mapping.
{Y} is the VFAT number in software units
{Z} In the same convention that Tuomas explained, is TXD_{Z}. The firmware calls Z=0 s-bits 0-7 (corresponding to VFAT channels 0-15), Z=1 is s-bits 8-15 (corresponding to VFAT channels 16-31) and so on.

The 100-pin panasonic connector looks like:

geb-v3b-100pin

The convention for the trigger unit that Tuomas has explained (@andrewpeck's email above) is shown as:

geb_trigger_layout_100_pin_panasonic

So GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.TAP_DELAY_VFATY_BITZ follows the hardware.

We already have one tool that checks the mapping:

void checkSbitMappingWithCalPulseLocal(localArgs *la, uint32_t *outData, uint32_t ohN, uint32_t vfatN, uint32_t mask, bool useCalPulse, bool currentPulse, uint32_t calScaleFactor, uint32_t nevts, uint32_t L1Ainterval, uint32_t pulseDelay){

I would be against modifying this tool to try to correct the mapping (since if you modify the 4 registers above incorrectly you can affect not just the Z^th bit but all 8 SBIts due to how the OH is expecting them. So what I would propose is the following procedure:

  1. When deploying new firmware or using new hardware for the first time the checkSbitMappingWithCalPulseLocal() function should be used to check that the sbit mapping is correct (this is easily done with checkSbitMappingAndRate.py,
  2. Analyze this data with anaSBitMonitor.py, this produces a list of mis-mapped sbits (see example here),
  3. Use this list as input for some new function correctSBitMappingErrorsLocal(...),
  4. Apply corrected timing and inverted register settings determined from analysis of correctSBitMappingErrorsLocal(...),
  5. Check the mapping is now correct with another call of checkSbitMappingAndRate.py.

The correctSBitMappingErrorsLocal(...) would call checkSbitMappingWithCalPulseLocal(), with a small event count, after making modifications so this could eliminate step 5 above.

Types of issue

  • Bug report (report an issue with the code)
  • Feature request (request for change which adds functionality)

Expected Behavior

How I expect correctSBitMappingErrors(...) and correctSBitMappingErrorsLocal(...) to function. General flow is shown below.

Unless otherwise noted for the code that will be added to ctp7_modules this should be placed in calibration_routines.h and calibration_routines.cc.

The calling function on the DAQ Machine

This is a new development and I'm not sure if the calling function should be created in the legacy xhal branch, or if it should be placed in cmsgemos (this in my eye is a calibration routine so it doesn't really fit in a HwDevice some input from @mexanick and @jsturdy would be appreciated here).

However general overview should be something like:

  • Read in a text file that has a table of mis-mapped sbits,
  • Take a user defined cut such that if N_Mismatches > cutVal the sbit is assume to have inverted polarity
  • Parse the table into two sets of keys for the RPC message, once the table is parsed loop over all VFATs and if VFATN is in the table set the key in the following cases:
    • SBITs that have bad timing (key should be MappingVFATN and this should set a std::vector<uint32_t> to this key),
    • SBITs that are assumed to be inverted (key should be InvertedVFATN and this should set a std::vector<uint32_t> to this key),
  • Set the keys above and send the RPC message
  • The RPC response is expected to have the following keys and all 24 VFATs are reported (do not skip any since we need all values):
    • SOT_INVERT, one integer
    • SOT_TAP_DELAY_VFATY, 24 integers
    • TAP_DELAY_VFATY_BITS, 24*8 = 192 integers
    • VFATY_TU_INVERT 24 integers
    • Note that while the above do not use all 32 bits, the RPC service can only send information as 32 bit chunks (unless you write as binary data but that's probably not needed here)
    • These should be then written to a text file so this can be used in configuring later or in uploading to the DB

Input parameters should be:

  • User defined cut that specifies if the number of N_Mismatches exceeds the cut value the sbit is assumed to have inverted polarity.
  • Physical filename to the file that specifies the mismatched table (format below) that lists the wrongly mapped SBITs.

Example table format is something like:

vfatN vfatSBIT SBIT_Size N_Mismatches
14 8 0 3428
14 8 1 2326
14 8 2 989
14 8 3 489
14 8 4 257
14 63 0 366
16 16 7 25200
16 40 7 25200
17 0 7 25200
17 16 0 4

Here any sbit with N_Mismatches beyond 25k can be assumed to have an inverted polarity but this strongly depends on the event count used when checkSbitMappingAndRate.py was called.

Outline of correctSBitMappingErrors(...)

Here we are getting the information from the RPC request, and it falls in two categories:

  • SBits whose timing is wrong in relation to the start of transmission (SOT),
  • SBits that are assumed to be inverted.

For the first case (wrong timing) we will construct a std::map<std::string,std::vector<uint32_t> > from the input MappingVFATN keys:

  1. For OH X of interest, loop over all 24 VFATs,
  2. For each vfatN check if a key exists "MappingVFATN" exists in the rpc message,
  3. If this key exists it gets a std::vector of mis-mapped sbits from the get_word_array function,
  4. This vector is stored in a std::map where the key is "MappingVFATN" or just "VFATN" for simplicity

Similarly we should construct a second map (as above) from the "InvertedVFATN" keys.

This should then get the vfatmask using:

uint32_t getOHVFATMaskLocal(localArgs * la, uint32_t ohN){

It should then loop over all unmasked vfats and for each iteration it should call the local function and use the constructed maps as input. The local function correctSBitMappingErrorsLocal() which should take the following input parameters:

  • ohN,
  • vfatN
    • Note if a vfat is in the constructed maps but also is in the vfatmask this should probably raise either an error or a warning (by setting the "error" key or "warning" key in the RPC response), this probably means the hardware isn't configured correctly (e.g. VFATs out of sync and user needs to be told)
  • the std::vector stored in the std::map for the "MappingVFATN" key,
  • the std::vector stored in the std::map for the "InvertedVFATN" key, and
  • A boolean which if true the mapping is attempted to be corrected, if false it just reads the relevant timing and invert registers and returns them.

The local function could then return for this vfatN an std::map<std::string, std::vector<uint32_t> > whose keys are:

  • SOT_TAP_DELAY_VFATY, this is a vector of one element
  • TAP_DELAY_VFATY_BITS, this is a vector of 8 elements
  • VFATY_TU_INVERT this is a vector of one element
    These three should then be added to three maps (that should be initialized before calling the loop above) and after all unmasked VFATs are looped over will contain all timing and inverted registers which give the correct configuration, e.g.:
std::map<std::string, uint32_t > map_sotTapDelay; //key here is `VFATN`, stores at most MAX_VFAT number of values (24 for GE1/1).
std::map<std::string, std::vector<uint32_t> > map_vfatTapDelay; //key here is `VFATN`, each vector has 8 elements
std::map<std::string, uint32_t> map_vfatInvert; //key here is `VFATN`, stores at most MAX_VFAT number of values (24 for GE1/1).

After everything is said and done there should be a read of SOT_INVERT and this should be placed in the RPC response as a data word.

Then the three final maps (map_sotTapDelay, map_vfatTapDelay, map_sotTapDelay) should be looped over (they will all have the same keys so one loop is sufficient) and stored in the RPC response, e.g.:

for(int vfat = 0; vfat < 24; ++vfat){
   std::string strVFAT = stdsprintf("VFAT%i",vfat);
   rsp.set_word(stdsprintf("SOT_TAP_DELAY_VFAT%i",vfat),map_sotTapDelay[strVFAT]);
   rsp.set_word_array((stdsprintf("TAP_DELAY_VFATY_BITS%i",vfat),map_vfatTapDelay[strVFAT]);
   rsp.set_word(stdsprintf("VFAT%i_TU_INVERT%i",vfat),map_vfatInvert[strVFAT]);
}

The function on the DAQ machine now has the correct configuration for this link.

Outline of correctSBitMappingErrorsLocal(...)

The local function will then be where the actual "meat" of the algorithm is done. This function should look like:

std::map<std::string,std::vector<uint32_t> > correctSBitMappingErrorsLocal(int ohN, int vfatN, std::vector<uint32_t> mismappedSBits, std::vector<uint32_t> invertedSBITs, bool correctMapping)

This function at the end should always read the following registers:

  • SOT_TAP_DELAY_VFATY,
  • TAP_DELAY_VFATY_BITZ, and
  • VFATY_TU_INVERT

This could be done by having a dedicated RPC method in vfat3.h/vfat3.cc (for reading one VFAT) and optohybrid.h/optohybrid.cc (for reading all VFATs) and the one in vfat3.h should be called by correctSBitMappingErrorsLocal.

It should only try to correct the mapping if correctMapping is true.

First we should loop over those members of invertedSBITs and write the corresponding bits in VFATY_TU_INVERT. This should be done by:

  1. Determining which 24TU_TXD_P<N> and 24TU_TXD_N<N> pair the i^th element of invertedSBITs refers to using the convention @andrewpeck illustrates above.
  2. Then flip the bit in VFATY_TU_INVERT that corresponds to this 24TU_TXD_P<N> and 24TU_TXD_N<N> pair,
    • Note you need to track if a bit has already been flipped since there could be multiple elements in invertedSBITs that will share this pair and all need to be flipped, so once you flip the bit the first time, any other elements of invertedSBITs that correspond to this pair should not cause the bit to be flipped again
  3. Repeat steps 1 & 2 for all elements of invertedSBITs.

After this you should call:

/*! \fn void checkSbitMappingWithCalPulseLocal(localArgs *la, uint32_t *outData, uint32_t ohN, uint32_t mask, bool currentPulse, uint32_t calScaleFactor, uint32_t nevts, uint32_t L1Ainterval, uint32_t pulseDelay)
* \brief With all but one channel masked, pulses a given channel, and then checks which sbits are seen by the CTP7, repeats for all channels on vfatN; reports the (vfat,chan) pulsed and (vfat,sbit) observed where sbit=chan*2; additionally reports if the cluster was valid.
* \details The SBIT Monitor stores the 8 SBITs that are sent from the OH (they are all sent at the same time and correspond to the same clock cycle). Each SBIT clusters readout from the SBIT Monitor is a 16 bit word with bits [0:10] being the sbit address and bits [12:14] being the sbit size, bits 11 and 15 are not used.
* \details The possible values of the SBIT Address are [0,1535]. Clusters with address less than 1536 are considered valid (e.g. there was an sbit); otherwise an invalid (no sbit) cluster is returned. The SBIT address maps to a given trigger pad following the equation \f$sbit = addr % 64\f$. There are 64 such trigger pads per VFAT. Each trigger pad corresponds to two VFAT channels. The SBIT to channel mapping follows \f$sbit=floor(chan/2)\f$. You can determine the VFAT position of the sbit via the equation \f$vfatPos=7-int(addr/192)+int((addr%192)/64)*8\f$.
* \details The SBIT size represents the number of adjacent trigger pads are part of this cluster. The SBIT address always reports the lowest trigger pad number in the cluster. The sbit size takes values [0,7]. So an sbit cluster with address 13 and with size of 2 includes 3 trigger pads for a total of 6 vfat channels and starts at channel \f$13*2=26\f$ and continues to channel \f$(2*15)+1=31\f$.
* \param la Local arguments structure
* \param outData pointer to an array of size (24*128*8*nevts) which stores the results of the scan, bits [0,7] channel pulsed; bits [8:15] sbit observed; bits [16:20] vfat pulsed; bits [21,25] vfat observed; bit 26 isValid; bits [27,29] are the cluster size
* \param ohN Optical link
* \param vfatN specific vfat position to be tested
* \param mask VFATs to be excluded from the trigger
* \param useCalPulse true (false) checks sbit mapping with calpulse on (off); useful for measuring noise
* \param currentPulse Selects whether to use current or volage pulse
* \param calScaleFactor
* \param nevts the number of cal pulses to inject per channel
* \param L1Ainterval How often to repeat signals (only for enable = true)
* \param pulseDelay delay between CalPulse and L1A
*/
void checkSbitMappingWithCalPulseLocal(localArgs *la, uint32_t *outData, uint32_t ohN, uint32_t vfatN, uint32_t mask, bool useCalPulse, bool currentPulse, uint32_t calScaleFactor, uint32_t nevts, uint32_t L1Ainterval, uint32_t pulseDelay){

Care should be taken to construct the input arguments properly (see function documentation). Additionally you don't need a lot of events (nevts=10 is probably sufficient). Also using the calpulse in voltageStepPulse mode should be fine (e.g. useCurrentPulse = false). You then should analyze the outData container to see if any of the bits you flipped suffer from mis-mapping. To do this see this example:

For any new mismatches you find you should add these to mismappedSBits, e.g.:

mismappedSBits.push_back(outData[idxOfNewMisMatch]);

Now here is where the hard part is. For each element of mismappedSBits the delays should be such that:

  • TU_SOT_P_24_TU_SOT_P_24 arrives first in the FPGA, followed by:
  • 24TU_TXD_P<0>, followed by
  • 24TU_TXD_P<1>, followed by
  • ...
  • ...
  • 24TU_TXD_P<7>

Note I've suppressed the negative part of the pair. To ensure this you need to manipulate:

  • SOT_TAP_DELAY_VFATY, and
  • TAP_DELAY_VFATY_BITZ

To accomplish this for the element of mismappedSBits. However other elements of mismappedSBits may share the same pair (e.g. 24TU_TXD_P<N> and 24TU_TXD_N<N> as the current element). So you should track which pairs you've already modified to prevent subsequent modification. Additionally, and more importantly, an element in mismappedSBits that is later on in the VFAT may be affected by your modification of an earlier bit. I would propose the following:

  1. Determine which differential pair the element of mismappedSBits comes from, this determines TAP_DELAY_VFATY_BITZ.
  2. Adding 1 to this TAP_DELAY_VFATY_BITZ register, (not sure the size of this register, but you should stop at the max...),
  3. For all subsequent TAP_DELAY_VFATY_BITZ registers where Z_prime > Z also add 1.
  4. Call checkSbitMappingWithCalPulseLocal(...) with a low event count,
  5. Decode the outData and remove any element from mismappedSBits which is now correctly mapped, add an sbit that is now incorrectly mapped, and
  6. Repeat steps 1-5 until all sbits have the correct mapping.

Some input here from @andrewpeck is needed to see if the above makes sense, particularly steps 2 & 3. For Step 5 I would suggest to use the Erase-Remove Idiom; you can find examples on stackoverflow.

Then afterward this function should read the following registers:

  • SOT_TAP_DELAY_VFATY,
  • TAP_DELAY_VFATY_BITZ, and
  • VFATY_TU_INVERT

Store these in an std::map<std::string,std::vector<uint32_t> > and return it. The mapping should now be correct.

Current Behavior

You have to do the above by hand using gem_reg.py (bad).

Context (for feature requests)

The sbit mapping is wrong. We need a software solution to correct this for both GE1/1 and future upgrades (GE2/1 & ME0).

@lpetre-ulb
Copy link
Contributor

lpetre-ulb commented Oct 25, 2018

May I suggest another solution for aligning the sbits ? I think we could find a firwmare/software solution which is more robust (and faster) than a pure software routine.

Here is how I see things :

  1. Instead of using fixed delay taps, one could dynamically configure the delays in order to always sampling the signal in the middle of the eye. We could implement a system similar to XAPP585, particularly per-bit deskew. It would be aligned in near realtime and could correct for voltage and temperature variations.

  2. Once we can reliably sample the time-multiplexed sbits, it would be possible to align all of them with a training phase. More specifically, configure the VFAT as follow : mask all channels except 0&1, 16&17, ... so the enabled sbits would be 0, 8, ... Set also the THR_ARM_DAC to a very low value (e.g. 0x1) in order to constantly measure noise. Therefore, on each sbit differential pair (and SOT also, I think), one would see 10000000. The signal is aligned using a simple bitslip. This is also where we see if there is an inversion in the polarity and correct for it.

  3. Once the training phase is done, the VFAT can return to "normal" operating mode. The alignment can be continuously checked by looking at the SOT frame.

I also think that such a solution can easily be ported to GE2/1 & ME0. In absence of an FPGA on the OH the step 1 should be done by the LpGBT (GBTX already phase aligns the data if I'm not mistaken). Steps 2 should be done in the backend firmware.

As the correct sbit mapping is required for the new TDC with full granularity, I would not be able to reliably test the new TDC firmware before the sbit mapping issue is solved. I could try to implement the previously described solution next week.

@andrewpeck
Copy link

andrewpeck commented Oct 26, 2018 via email

@lpetre-ulb
Copy link
Contributor

Hi Andrew,

Thank you very much for your very detailed reply.

GBT does not phase align data. It has a similar fixed delay, and we do a hand-scan of phase values to find the window and then fuse a hard-coded sampling phase into the chip.

I had seen the possibility for the GBTX for automatically choose the correct phase in some slides. By more carefully reading the manual, I see that this method is not resistant to SEUs. Too bad...

S-bit "mapping" only creates a rotation of the S-bits so that 01234567 becomes 7123456. The TDC just uses the OR of the entire VFAT, correct? In which case you should be able to proceed as is, right?

Indeed, the actual version of the TDC uses the OR of an entire VFAT. This is how we made the first measurement with the v3 electronics (see this elog). You can notice that we still have a lot of improvement to do, both on the setup and on the detector configuration.

However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.

Regarding the rotation of the Sbits, are you sure that it is not possible that one Sbit is not correctly associated to the correct BX ? If you look at the histogram slide 7 of this presentation, it looks like there are three peaks. The leftmost one is roughly separated from the main one by 25ns, that is 1 BX.
While working on the v2a with an old firmware which did not time align the Sbits, I observed a similar behavior. The fix (for the timing measurement) was to OR the Sbits on the VFAT2 itself and use only 1 Sbit transmission line.

On the GEB v3c, you can see perhaps a timing issue in VFAT18, VFAT8, VFAT0, VFAT11, VFAT3 but all of these problems would not be an issue if you are just using the OR of the VFAT for timing measurement

VFAT14 has something else very wrong that could not be explained by timing or inversion.

One remark about this plot; I don't known if it is written somewhere, but on this GEBv3c plot posted by Brian, the firmware uses the v3b taps configuration. More precisely, this is the first TDC firmware, based on version 3.1.2B. So that configuration has mixed hardware/firmware. It might explain a behavior different than those observed on others GEBv3c.

We will be working on the timing question in the next few weeks and should have an idea soon how well things work, how consistent the parameters are across time, temperature, etc and hopefully should be able to fix some of these issues through firmware (but certainly not all of them).

Let me known if I can help you in any way with this issue. I also think we received one long GEB at ULB.

Best regards,
Laurent

@bdorney
Copy link
Contributor Author

bdorney commented Oct 29, 2018

However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.

This is both alarmist and also not true. You are able to see which sbits are mapped correctly using the checkSbitMappingAndRate.py tool. For those vfats that have mismapped sbits mask them from the trigger block in the OH using the instructions here. This enables you to make tests of your FW module seamlessly.

It's possible I didn't understand the conversation above due to ignorance. But it should be explicitly clear that we will not make design choices to the optohybrid firmware just to accommodate this TDC module. If you are interested in working on solving this sbit mipmapping issue, which is a critical path issue for P5, please use the RPC module approach I've outlined above. Also since I think @andrewpeck has targeted this for his student you should discuss with him on how to contribute so we don't have two different people trying to solve the same problem (as that would be inefficient).

@lpetre-ulb
Copy link
Contributor

However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.

This is both alarmist and also not true. You are able to see which sbits are mapped correctly using the checkSbitMappingAndRate.py tool. For those vfats that have mismapped sbits mask them from the trigger block in the OH using the instructions here. This enables you to make tests of your FW module seamlessly.

Sure, I can mask the unused VFATs. However, there is only one position where it is possible to connect a VFAT on the GEM chamber at ULB and if the mapping of that VFAT is wrong, it won't help to mask it. And masking VFATs will not allow to test how the TDC behaves with the full detector : will slow controls sustain the acquisition rate ? won't the (little amount of) noise mask the signal since noise will be picked up from the full detector ? ...

It's possible I didn't understand the conversation above due to ignorance. But it should be explicitly clear that we will not make design choices to the optohybrid firmware just to accommodate this TDC module. If you are interested in working on solving this sbit mipmapping issue, which is a critical path issue for P5, please use the RPC module approach I've outlined above. Also since I think @andrewpeck has targeted this for his student you should discuss with him on how to contribute so we don't have two different people trying to solve the same problem (as that would be inefficient).

Of course, it is not to accommodate the TDC module. The conversation about was about the Sbit mapping issue due to bad Sbit timing parameter in all generality. Yes, it is best if we collaborate on fixing the issue; that is the meaning of the last sentence of my previous post.

@andrewpeck
Copy link

Indeed, the actual version of the TDC uses the OR of an entire VFAT. This is how we made the first measurement with the v3 electronics (see this elog). You can notice that we still have a lot of improvement to do, both on the setup and on the detector configuration.

However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.

Regarding the rotation of the Sbits, are you sure that it is not possible that one Sbit is not correctly associated to the correct BX ? If you look at the histogram slide 7 of this presentation, it looks like there are three peaks. The leftmost one is roughly separated from the main one by 25ns, that is 1 BX.
While working on the v2a with an old firmware which did not time align the Sbits, I observed a similar behavior. The fix (for the timing measurement) was to OR the Sbits on the VFAT2 itself and use only 1 Sbit transmission line.

Yes, the timing is a whole separate issue that will need to be addressed as well.

The bx is determined by the alignment of the SoT relative to the 40MHz clock.

But right now the 40MHz clock phase is completely arbitrary, so depending on the phase the S-bits will end up split randomly into different bunches. We need to phase shift the 40MHz clock (done on the GBTx) so center the data so that the S-bits that are supposed to be synchronous are falling in the same bx. Nobody has ever done this step (you are the first person besides me to even mention it... :(

One remark about this plot; I don't known if it is written somewhere, but on this GEBv3c plot posted by Brian, the firmware uses the v3b taps configuration. More precisely, this is the first TDC firmware, based on version 3.1.2B. So that configuration has mixed hardware/firmware. It might explain a behavior different than those observed on others GEBv3c.

Supposedly, based on the design files, the v3c and v3b should be the same, but this doesn't seem to be the case in reality :( So we need to figure it what it is supposed to be.. but naively on the 1st order they should be the same, to the best of our knowledge, hence why Brian was using the v3b config on v3c electronics.

Our student is starting today with getting things setup.. hopefully it won't take very long to get some working config for the v3c

@andrewpeck andrewpeck removed their assignment Nov 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants