-
Notifications
You must be signed in to change notification settings - Fork 13
Discussion: Software Routine for Determining Proper SBIT Timing Parameters #61
Comments
May I suggest another solution for aligning the sbits ? I think we could find a firwmare/software solution which is more robust (and faster) than a pure software routine. Here is how I see things :
I also think that such a solution can easily be ported to GE2/1 & ME0. In absence of an FPGA on the OH the step 1 should be done by the LpGBT (GBTX already phase aligns the data if I'm not mistaken). Steps 2 should be done in the backend firmware. As the correct sbit mapping is required for the new TDC with full granularity, I would not be able to reliably test the new TDC firmware before the sbit mapping issue is solved. I could try to implement the previously described solution next week. |
Hi Laurent,
Thanks for the feedback. I have some comments inline below:
On Thu, Oct 25, 2018 at 2:43 AM lpetre-ulb ***@***.***> wrote:
May I suggest another solution for aligning the sbits ? I think we could
find a firwmare/software solution which is more robust (and faster) than a
pure software routine.
Here is how I see things 👍
Instead of using fixed delay taps, one could dynamically configure the
delays in order to always sampling the signal in the middle of the eye. We
could implement a system similar to XAPP585
<https://www.xilinx.com/support/documentation/application_notes/xapp585-lvds-source-synch-serdes-clock-multiplication.pdf>,
particularly per-bit deskew. It would be aligned in near realtime and could
correct for voltage and temperature variations.
The firmware right now uses dynamic configuration of the delays (based on
https://www.xilinx.com/support/documentation/application_notes/xapp881_V6_4X_Asynch_OverSampling.pdf
)
It centers the data inside the eye automatically, based on the SOT pulse
which is received every clock. We wanted to phase length match the
different S-bits coming from a VFAT so that in principle the same alignment
state machine could be used for all 9 pairs coming from a single VFAT. The
SOT would determine the timing and the corresponding S-bit traces would be
automatically aligned to it because they have the same timing.
The phase alignment that was done on the PCBs however, is not good, so
there is some skew from channel to channel and we hoped to just correct
that with fixed delays that simply align the S-bits coming from a single
VFAT so that they are in completely in sync with eachother. Temperature
drift and so on should affect all 9 pairs equally (at least within the
tolerance of the very large 3.125ns eye).
So the process of timing in these delays should just need to be done once
in the lab and we are over with it for the whole detector, and do not need
special routines at the beginning of every hard reset. We did it already on
v3a by hand and it worked well but nobody ever repeated the exercise on v3b
and v3c where some positions have changed.
The belief underlying this is that there may be VFAT to VFAT variation, GEB
to GEB variation, but that the variation within a single VFAT should be
small and this mechanism just needs to keep them in phase +- a nanosecond
or so (using 78ps tap delays) so there is actually a lot of slosh for
things to be out of time. The big requirement of this system is that the
different output channels of a single VFAT should be consistently timed in
with eachother when coming from the VFAT, which I really hope is true, and
that the IODelays work more-or-less correctly within the slack acceptable
by the sampling window (which they should, they are calibrated by the chip.
Once we can reliably sample the time-multiplexed sbits, it would be
possible to align all of them with a training phase. More specifically,
configure the VFAT as follow : mask all channels except 0&1, 16&17, ... so
the enabled sbits would be 0, 8, ... Set also the THR_ARM_DAC to a very low
value (e.g. 0x1) in order to constantly measure noise. Therefore, on each
sbit differential pair (and SOT also, I think), one would see 10000000. The
signal is the aligned using a simple bitslip. This is also where we see if
there is an inversion in the polarity and correct for it.
This is basically just what we are trying to do right now with the script
that Brian described, except not as an automatic routine but just something
to derive constants for the firmware. This is a possibility of course, to
have automatic alignment using some calpulses but I wanted to try to get
this working on the boards without co-dependent CTP7 firmware and software
routines that I have no control over. It seemed to work fine but if we run
into problems perhaps we reconsider whether something like this is needed.
Once the training phase is done, the VFAT can return to "normal"
operating mode. The alignment can be continuously checked by looking at the
SOT frame.
I also think that such a solution can easily be ported to GE2/1 & ME0. In
absence of an FPGA on the OH the step 1 should be done by the LpGBT (GBTX
already phase aligns the data if I'm not mistaken). Steps 2 should be done
in the backend firmware.
GBT does not phase align data. It has a similar fixed delay, and we do a
hand-scan of phase values to find the window and then fuse a hard-coded
sampling phase into the chip.
As the correct sbit mapping is required for the new TDC with full
granularity, I would not be able to reliably test the new TDC firmware
before the sbit mapping issue is solved. I could try to implement the
previously described solution next week.
S-bit "mapping" only creates a rotation of the S-bits so that 01234567
becomes 7123456. The TDC just uses the OR of the entire VFAT, correct? In
which case you should be able to proceed as is, right?
Fyi, there are several unrelated problems that are sometimes referred to as
"S-bit mapping" but many of them seem to perhaps be unrelated to the
timing/mapping but do fall under the umbrella of problems with S-bits.
In the plots shown previously by Brian, for example, of *GEB v3b*:
None of these problems seem to be what I would expect from timing issues.
Either the VFAT or the OH seems to just be broken in slots 16, 17, 22, 14,
or bad solder joints, etc.. Polarity inversion could explain the issue on
VFAT14.
On the *GEB v3c*, you can see perhaps a timing issue in VFAT18, VFAT8, VFAT0,
VFAT11, VFAT3 but all of these problems would not be an issue if you are
just using the OR of the VFAT for timing measurement
VFAT14 has something else very wrong that could not be explained by timing
or inversion.
All the issues with calpulses showing up in the wrong VFAT also should have
nothing to do with mapping or timing and could be indicative of something
like crosstalk (which we know exists, since we see S-bits coming from
disconnected VFATs).
We will be working on the timing question in the next few weeks and should
have an idea soon how well things work, how consistent the parameters are
across time, temperature, etc and hopefully should be able to fix some of
these issues through firmware (but certainly not all of them).
Best wishes,
Andrew
|
Hi Andrew, Thank you very much for your very detailed reply.
I had seen the possibility for the GBTX for automatically choose the correct phase in some slides. By more carefully reading the manual, I see that this method is not resistant to SEUs. Too bad...
Indeed, the actual version of the TDC uses the OR of an entire VFAT. This is how we made the first measurement with the v3 electronics (see this elog). You can notice that we still have a lot of improvement to do, both on the setup and on the detector configuration. However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping. Regarding the rotation of the Sbits, are you sure that it is not possible that one Sbit is not correctly associated to the correct BX ? If you look at the histogram slide 7 of this presentation, it looks like there are three peaks. The leftmost one is roughly separated from the main one by 25ns, that is 1 BX.
One remark about this plot; I don't known if it is written somewhere, but on this GEBv3c plot posted by Brian, the firmware uses the v3b taps configuration. More precisely, this is the first TDC firmware, based on version 3.1.2B. So that configuration has mixed hardware/firmware. It might explain a behavior different than those observed on others GEBv3c.
Let me known if I can help you in any way with this issue. I also think we received one long GEB at ULB. Best regards, |
This is both alarmist and also not true. You are able to see which sbits are mapped correctly using the It's possible I didn't understand the conversation above due to ignorance. But it should be explicitly clear that we will not make design choices to the optohybrid firmware just to accommodate this TDC module. If you are interested in working on solving this sbit mipmapping issue, which is a critical path issue for P5, please use the RPC module approach I've outlined above. Also since I think @andrewpeck has targeted this for his student you should discuss with him on how to contribute so we don't have two different people trying to solve the same problem (as that would be inefficient). |
Sure, I can mask the unused VFATs. However, there is only one position where it is possible to connect a VFAT on the GEM chamber at ULB and if the mapping of that VFAT is wrong, it won't help to mask it. And masking VFATs will not allow to test how the TDC behaves with the full detector : will slow controls sustain the acquisition rate ? won't the (little amount of) noise mask the signal since noise will be picked up from the full detector ? ...
Of course, it is not to accommodate the TDC module. The conversation about was about the Sbit mapping issue due to bad Sbit timing parameter in all generality. Yes, it is best if we collaborate on fixing the issue; that is the meaning of the last sentence of my previous post. |
Yes, the timing is a whole separate issue that will need to be addressed as well. The bx is determined by the alignment of the SoT relative to the 40MHz clock. But right now the 40MHz clock phase is completely arbitrary, so depending on the phase the S-bits will end up split randomly into different bunches. We need to phase shift the 40MHz clock (done on the GBTx) so center the data so that the S-bits that are supposed to be synchronous are falling in the same bx. Nobody has ever done this step (you are the first person besides me to even mention it... :(
Supposedly, based on the design files, the v3c and v3b should be the same, but this doesn't seem to be the case in reality :( So we need to figure it what it is supposed to be.. but naively on the 1st order they should be the same, to the best of our knowledge, hence why Brian was using the v3b config on v3c electronics. Our student is starting today with getting things setup.. hopefully it won't take very long to get some working config for the v3c |
Brief summary of issue
So we have seen that we have an issue with the sbit mapping in V3 electronics. This issue persists when using GEBv3c+OHv3c hardware:
While the situation with complete v3c hardware is improved it is still not desired. Additionally this is just for the short detector and we will need a set of parameters also a long detector. Then for GE2/1 there will be 8 sets of parameters, and ME0 will contribute another set. So we need a software routine that can automatically determine the correct set of timing registers.
The registers of interest are:
GEM_AMC.OH.OHX.FPGA.TRIG.CTRL.SOT_INVERT
GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.SOT_TAP_DELAY_VFATY
GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.TAP_DELAY_VFATY_BITZ
GEM_AMC.OH.OHX.FPGA.TRIG.CTRL.VFATY_TU_INVERT
According to @andrewpeck
The 100-pin panasonic connector looks like:
The convention for the trigger unit that Tuomas has explained (@andrewpeck's email above) is shown as:
So
GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.TAP_DELAY_VFATY_BITZ
follows the hardware.We already have one tool that checks the mapping:
ctp7_modules/src/calibration_routines.cpp
Line 963 in e1d9d0c
I would be against modifying this tool to try to correct the mapping (since if you modify the 4 registers above incorrectly you can affect not just the
Z^th
bit but all 8 SBIts due to how the OH is expecting them. So what I would propose is the following procedure:checkSbitMappingWithCalPulseLocal()
function should be used to check that the sbit mapping is correct (this is easily done with checkSbitMappingAndRate.py,correctSBitMappingErrorsLocal(...)
,correctSBitMappingErrorsLocal(...)
,The
correctSBitMappingErrorsLocal(...)
would callcheckSbitMappingWithCalPulseLocal()
, with a small event count, after making modifications so this could eliminate step 5 above.Types of issue
Expected Behavior
How I expect
correctSBitMappingErrors(...)
andcorrectSBitMappingErrorsLocal(...)
to function. General flow is shown below.Unless otherwise noted for the code that will be added to
ctp7_modules
this should be placed incalibration_routines.h
andcalibration_routines.cc
.The calling function on the DAQ Machine
This is a new development and I'm not sure if the calling function should be created in the legacy
xhal
branch, or if it should be placed incmsgemos
(this in my eye is a calibration routine so it doesn't really fit in aHwDevice
some input from @mexanick and @jsturdy would be appreciated here).However general overview should be something like:
N_Mismatches > cutVal
the sbit is assume to have inverted polarityVFATN
is in the table set the key in the following cases:MappingVFATN
and this should set astd::vector<uint32_t>
to this key),InvertedVFATN
and this should set astd::vector<uint32_t>
to this key),SOT_INVERT
, one integerSOT_TAP_DELAY_VFATY
, 24 integersTAP_DELAY_VFATY_BITS
,24*8 = 192
integersVFATY_TU_INVERT
24 integersInput parameters should be:
N_Mismatches
exceeds the cut value the sbit is assumed to have inverted polarity.Example table format is something like:
Here any sbit with
N_Mismatches
beyond 25k can be assumed to have an inverted polarity but this strongly depends on the event count used whencheckSbitMappingAndRate.py
was called.Outline of
correctSBitMappingErrors(...)
Here we are getting the information from the RPC request, and it falls in two categories:
For the first case (wrong timing) we will construct a
std::map<std::string,std::vector<uint32_t> >
from the inputMappingVFATN
keys:vfatN
check if a key exists"MappingVFATN"
exists in the rpc message,std::vector
of mis-mapped sbits from theget_word_array
function,std::map
where the key is"MappingVFATN"
or just "VFATN" for simplicitySimilarly we should construct a second map (as above) from the
"InvertedVFATN"
keys.This should then get the
vfatmask
using:ctp7_modules/src/amc.cpp
Line 42 in e1d9d0c
It should then loop over all unmasked vfats and for each iteration it should call the local function and use the constructed maps as input. The local function
correctSBitMappingErrorsLocal()
which should take the following input parameters:ohN
,vfatN
vfat
is in the constructed maps but also is in thevfatmask
this should probably raise either an error or a warning (by setting the"error"
key or"warning"
key in the RPC response), this probably means the hardware isn't configured correctly (e.g. VFATs out of sync and user needs to be told)std::vector
stored in thestd::map
for the"MappingVFATN"
key,std::vector
stored in thestd::map
for the"InvertedVFATN"
key, andThe local function could then return for this
vfatN
an std::map<std::string, std::vector<uint32_t> > whose keys are:SOT_TAP_DELAY_VFATY
, this is a vector of one elementTAP_DELAY_VFATY_BITS
, this is a vector of 8 elementsVFATY_TU_INVERT
this is a vector of one elementThese three should then be added to three maps (that should be initialized before calling the loop above) and after all unmasked VFATs are looped over will contain all timing and inverted registers which give the correct configuration, e.g.:
After everything is said and done there should be a read of
SOT_INVERT
and this should be placed in the RPC response as a data word.Then the three final maps (
map_sotTapDelay
,map_vfatTapDelay
,map_sotTapDelay
) should be looped over (they will all have the same keys so one loop is sufficient) and stored in the RPC response, e.g.:The function on the DAQ machine now has the correct configuration for this link.
Outline of
correctSBitMappingErrorsLocal(...)
The local function will then be where the actual "meat" of the algorithm is done. This function should look like:
This function at the end should always read the following registers:
SOT_TAP_DELAY_VFATY
,TAP_DELAY_VFATY_BITZ
, andVFATY_TU_INVERT
This could be done by having a dedicated RPC method in
vfat3.h/vfat3.cc
(for reading one VFAT) andoptohybrid.h/optohybrid.cc
(for reading all VFATs) and the one invfat3.h
should be called bycorrectSBitMappingErrorsLocal
.It should only try to correct the mapping if
correctMapping
is true.First we should loop over those members of
invertedSBITs
and write the corresponding bits inVFATY_TU_INVERT
. This should be done by:24TU_TXD_P<N>
and24TU_TXD_N<N>
pair thei^th
element of invertedSBITs refers to using the convention @andrewpeck illustrates above.VFATY_TU_INVERT
that corresponds to this24TU_TXD_P<N>
and24TU_TXD_N<N>
pair,invertedSBITs
that will share this pair and all need to be flipped, so once you flip the bit the first time, any other elements ofinvertedSBITs
that correspond to this pair should not cause the bit to be flipped againinvertedSBITs
.After this you should call:
ctp7_modules/src/calibration_routines.cpp
Lines 946 to 963 in e1d9d0c
Care should be taken to construct the input arguments properly (see function documentation). Additionally you don't need a lot of events (
nevts=10
is probably sufficient). Also using the calpulse in voltageStepPulse mode should be fine (e.g.useCurrentPulse = false
). You then should analyze theoutData
container to see if any of the bits you flipped suffer from mis-mapping. To do this see this example:For any new mismatches you find you should add these to
mismappedSBits
, e.g.:Now here is where the hard part is. For each element of
mismappedSBits
the delays should be such that:TU_SOT_P_24_TU_SOT_P_24
arrives first in the FPGA, followed by:24TU_TXD_P<0>
, followed by24TU_TXD_P<1>
, followed by24TU_TXD_P<7>
Note I've suppressed the negative part of the pair. To ensure this you need to manipulate:
SOT_TAP_DELAY_VFATY
, andTAP_DELAY_VFATY_BITZ
To accomplish this for the element of
mismappedSBits
. However other elements ofmismappedSBits
may share the same pair (e.g.24TU_TXD_P<N>
and24TU_TXD_N<N>
as the current element). So you should track which pairs you've already modified to prevent subsequent modification. Additionally, and more importantly, an element inmismappedSBits
that is later on in the VFAT may be affected by your modification of an earlier bit. I would propose the following:mismappedSBits
comes from, this determinesTAP_DELAY_VFATY_BITZ
.1
to thisTAP_DELAY_VFATY_BITZ
register, (not sure the size of this register, but you should stop at the max...),TAP_DELAY_VFATY_BITZ
registers whereZ_prime > Z
also add 1.checkSbitMappingWithCalPulseLocal(...)
with a low event count,outData
and remove any element frommismappedSBits
which is now correctly mapped, add an sbit that is now incorrectly mapped, andSome input here from @andrewpeck is needed to see if the above makes sense, particularly steps 2 & 3. For Step 5 I would suggest to use the Erase-Remove Idiom; you can find examples on stackoverflow.
Then afterward this function should read the following registers:
SOT_TAP_DELAY_VFATY
,TAP_DELAY_VFATY_BITZ
, andVFATY_TU_INVERT
Store these in an
std::map<std::string,std::vector<uint32_t> >
and return it. The mapping should now be correct.Current Behavior
You have to do the above by hand using
gem_reg.py
(bad).Context (for feature requests)
The sbit mapping is wrong. We need a software solution to correct this for both GE1/1 and future upgrades (GE2/1 & ME0).
The text was updated successfully, but these errors were encountered: