Non trivial slice assignments and tensor manipulation #941
Replies: 13 comments 2 replies
-
So it looks like slicing is what you're looking for. I just knocked up a quick example of adding the elements in a subview of a matrix. use ndarray::{arr2, s};
fn main() {
let mut a = arr2(&[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]]);
println!("{}", a);
let mut view = a.slice_mut(s![.., 0..2]);
view.mapv_inplace(|x| x + x);
println!("{}", a);
} That's for your example where you add boxes and boxes together. For another one you can do the equivalent slice and then I believe jsomething like |
Beta Was this translation helpful? Give feedback.
-
I see i was trying Thank you very much i will try it out. |
Beta Was this translation helpful? Give feedback.
-
These are a couple of implementations for use ndarray::concatenate;
use ndarray::prelude::*;
pub fn decode(
loc: ArrayView2<'_, f32>,
priors: ArrayView2<'_, f32>,
variances: &[f32],
) -> Array2<f32> {
let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
let (loc_to, loc_from) = loc.view().split_at(Axis(1), 2);
let mut boxes_to = variances[0] * &loc_to * &priors_from + &priors_to;
let mut boxes_from = loc_from.mapv(|x| (x * variances[1]).exp()) * &priors_from;
boxes_to -= &(&boxes_from / 2.);
boxes_from += &boxes_to;
concatenate!(Axis(1), boxes_to, boxes_from)
}
pub fn decode2(
loc: ArrayView2<'_, f32>,
priors: ArrayView2<'_, f32>,
variances: &[f32],
) -> Array2<f32> {
let mut boxes = Array2::zeros(priors.raw_dim());
let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
let (loc_to, loc_from) = loc.view().split_at(Axis(1), 2);
let (boxes_to, boxes_from) = boxes.view_mut().split_at(Axis(1), 2);
azip!(
(
loc_to in loc_to, priors_to in priors_to, boxes_to in boxes_to,
loc_from in loc_from, priors_from in priors_from, boxes_from in boxes_from,
)
{
*boxes_to = priors_to + variances[0] * loc_to * priors_from;
*boxes_from = priors_from * (loc_from * variances[1]).exp();
*boxes_to -= *boxes_from / 2.;
*boxes_from += *boxes_to;
}
);
boxes
} You could use slicing instead of For the second question, you could use slicing and use itertools::izip;
use ndarray::prelude::*;
pub fn second_example(
pre: ArrayView2<'_, f32>,
priors: ArrayView2<'_, f32>,
variances: &[f32],
) -> Array2<f32> {
let mut landms = Array2::zeros(pre.raw_dim());
let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
for (landms_chunk, pre_chunk) in izip!(
landms.axis_chunks_iter_mut(Axis(1), 2),
pre.axis_chunks_iter(Axis(1), 2)
) {
azip!(
(
priors_to in priors_to, priors_from in priors_from,
landms_chunk in landms_chunk, pre_chunk in pre_chunk
)
*landms_chunk = priors_to + pre_chunk * variances[0] * priors_from
);
}
landms
} Edit: Or, you could do this, which is more concise but creates a temporary allocation in each iteration of the use itertools::izip;
use ndarray::prelude::*;
pub fn second_example2(
pre: ArrayView2<'_, f32>,
priors: ArrayView2<'_, f32>,
variances: &[f32],
) -> Array2<f32> {
let mut landms = Array2::zeros(pre.raw_dim());
let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
for (mut landms_chunk, pre_chunk) in izip!(
landms.axis_chunks_iter_mut(Axis(1), 2),
pre.axis_chunks_iter(Axis(1), 2)
) {
landms_chunk.assign(&(variances[0] * &pre_chunk * &priors_from + &priors_to));
}
landms
} I wonder if you'd be happier using arrays with more axes. For example, I wonder if it would make sense for your use-case for |
Beta Was this translation helpful? Give feedback.
-
Wow... thank you this explains a lot and has great educational value for me. I will test and try to understand these approaches and get back to you. However final test probably wont be possible until i translate all the code. Not easy to compare with python implementation i have to hardcode values + torch is not using fixed input size like onnxruntime i am using in rust. I am pretty close to glue everything up, got a few more numpy postprocessing lines to go. Currently here: https://github.com/biubug6/Pytorch_Retinaface/blob/master/detect.py#L121 So got two more functions to resolve |
Beta Was this translation helpful? Give feedback.
-
So, you want to implement the following lines? # ignore low scores
inds = np.where(scores > args.confidence_threshold)[0]
boxes = boxes[inds]
landms = landms[inds]
scores = scores[inds]
# keep top-K before NMS
order = scores.argsort()[::-1][:args.top_k]
boxes = boxes[order]
landms = landms[order]
scores = scores[order] What are the shapes of |
Beta Was this translation helpful? Give feedback.
-
Yes, those are the remaining lines to finish the algorithm + a few more but those shouldn't be a problem pretty straightforward slicing operations. Regarding the shapes, this is output of original implementation:
So basically they are |
Beta Was this translation helpful? Give feedback.
-
Pushed current code including your implementations: https://github.com/vladimirmujagic/rust-retinaface |
Beta Was this translation helpful? Give feedback.
-
Here are a couple of ways to do it: pub fn select_high_scores(
boxes: ArrayView2<'_, f32>,
landms: ArrayView2<'_, f32>,
scores: ArrayView1<'_, f32>,
confidence_threshold: f32,
top_k: usize,
) -> (Array2<f32>, Array2<f32>, Array1<f32>) {
let mut ind_above_thresh: Vec<usize> = scores
.iter()
.enumerate()
.filter_map(|(ind, &score)| (score > confidence_threshold).then(|| ind))
.collect();
ind_above_thresh.sort_unstable_by(|&ind1, &ind2| {
// Note the swapped order of `ind1` and `ind2` for sorting in descending order.
scores[ind2]
.partial_cmp(&scores[ind1])
.expect("Score must not be NaN.")
});
let top_k_ind = &ind_above_thresh[..top_k];
(
boxes.select(Axis(0), top_k_ind),
landms.select(Axis(0), top_k_ind),
scores.select(Axis(0), top_k_ind),
)
}
pub fn select_high_scores2(
boxes: ArrayView2<'_, f32>,
landms: ArrayView2<'_, f32>,
scores: ArrayView1<'_, f32>,
confidence_threshold: f32,
top_k: usize,
) -> (Array2<f32>, Array2<f32>, Array1<f32>) {
let mut above_thresh: Vec<(usize, f32)> = scores
.iter()
.copied()
.enumerate()
.filter(|&(_ind, score)| score > confidence_threshold)
.collect();
above_thresh.sort_unstable_by(|(_, score1), (_, score2)| {
// Note the swapped order of `score1` and `score2` for sorting in descending order.
score2.partial_cmp(score1).expect("Score must not be NaN.")
});
let top_k_ind: Vec<usize> = above_thresh
.iter()
.map(|&(ind, _)| ind)
.take(top_k)
.collect();
(
boxes.select(Axis(0), &top_k_ind),
landms.select(Axis(0), &top_k_ind),
scores.select(Axis(0), &top_k_ind),
)
} I'm not sure which will be faster. The second approach has an extra allocation, but the sorting should be faster because it doesn't have to randomly index into the |
Beta Was this translation helpful? Give feedback.
-
Added your code to current flow and tested for correctness by comparing original python implementation and its outputs to rust implementation. Here is the current postprocessing flow: // Postprocessing
// ------------------------------------------------------------------------------------------------------------------------------
let loc = &outputs[0];
println!("loc: {:?}", loc);
let conf = &outputs[1];
let landms = &outputs[2];
let confidence_threshold = 0.02;
let top_k = 5000;
let nms_threshold = 0.4;
let keep_top_k = 750;
let vis_threhsold = 0.6;
let scale = array![image_width as f32, image_height as f32, image_width as f32, image_height as f32];
let resize = 1.0;
let priorbox = PriorBox::new((target_width, target_height));
let prior_data = priorbox.forward();
let variances = priorbox.cfg().variances();
let loc_squeezed = loc.slice(s![0, .., ..]).to_owned();
let mut boxes = decode_boxes(loc_squeezed.view(), prior_data.view(), variances);
boxes = boxes * scale;
boxes.mapv_inplace(|v| v / resize);
let conf_squeezed: Array2<f32> = conf.slice(s![0, .., ..]).to_owned();
let scores = conf_squeezed.column(1);
let landms_squeezed = landms.slice(s![0, .., ..]).to_owned();
let mut landmarks = decode_landm(landms_squeezed.view(), prior_data.view(), variances);
let scale1 = array![
image_width as f32, image_height as f32,
image_width as f32, image_height as f32,
image_width as f32, image_height as f32,
image_width as f32, image_height as f32,
image_width as f32, image_height as f32
];
landmarks = landmarks * scale1;
landmarks.mapv_inplace(|v| v / resize);
let (s_boxes, s_landms, s_scores) =
select_high_scores(boxes.view(), landmarks.view(), scores, confidence_threshold, top_k);
// ------------------------ ------------------------------------------------------------------------------------------------------ Code is compiling but produces unexpected error for given thresholds: let confidence_threshold = 0.02;
let top_k = 5000; Error:
So this implies if i am not mistaken that filtering operation: let mut ind_above_thresh: Vec<usize> = scores
.iter()
.enumerate()
.filter_map(|(ind, &score)| (score > confidence_threshold).then(|| ind))
.collect(); discards too many values which further implies that network outputs might be wrong (delta is too big), this is still not confirmed and shouldn't be the case even though original implementation is using Still trying to understand if this can somehow be related to incorrect tensor manipulations |
Beta Was this translation helpful? Give feedback.
-
Yeah, if that panic occurs on the line |
Beta Was this translation helpful? Give feedback.
-
The course of this issue is just a discussion I guess? I could move it to discussions, and using discussions is also a good idea (but I realize they might be less visible). |
Beta Was this translation helpful? Give feedback.
-
Yeah, it's just a discussion. I didn't realize that it was possible to convert issues to discussions. I'll try it. Edit: That's pretty cool. Thanks for pointing out that feature. |
Beta Was this translation helpful? Give feedback.
-
I see that you wrote |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to port Retinaface (face and landmark detection in pytorch) to rust and was just wondering if you support operations similar to
I couldn't find similar functionalities in your library to implement operations like
boxes[:, :2] -= boxes[:, 2:] / 2
This is my current implementation which compiles but is still not tested for correctness and is not optimized.
Also is it possible to stack multiple tensors like:
Or i have to go 2 by 2 and produce same results?
Beta Was this translation helpful? Give feedback.
All reactions