Non trivial slice assignments and tensor manipulation #941

vladimirmujagic · 2021-03-11T12:16:13Z

vladimirmujagic
Mar 11, 2021

Hello,

I am trying to port Retinaface (face and landmark detection in pytorch) to rust and was just wondering if you support operations similar to

def decode(loc, priors, variances):
    """Decode locations from predictions using priors to undo
    the encoding we did for offset regression at train time.
    Args:
        loc (tensor): location predictions for loc layers,
            Shape: [num_priors,4]
        priors (tensor): Prior boxes in center-offset form.
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        decoded bounding box predictions
    """

    boxes = torch.cat((
        priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
        priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1)
    boxes[:, :2] -= boxes[:, 2:] / 2
    boxes[:, 2:] += boxes[:, :2]
    return boxes

I couldn't find similar functionalities in your library to implement operations like boxes[:, :2] -= boxes[:, 2:] / 2

This is my current implementation which compiles but is still not tested for correctness and is not optimized.

pub fn decode(
    loc: ArrayBase<OwnedRepr<f32>, Dim<[usize; 2]>>,
    priors: ArrayBase<OwnedRepr<f32>, Dim<[usize; 2]>>,
    variances: &Vec<f32>
) -> Result<ArrayBase<OwnedRepr<f32>, Dim<[usize; 2]>>, Error>
{
    let priors_to = priors.clone().slice_move(s![.., ..2]);
    let priors_from = priors.slice_move(s![.., 2..]);

    let mut loc_to = loc.clone().slice_move(s![.., ..2]);
    let mut loc_from = loc.slice_move(s![.., 2..]);
    loc_to = loc_to.mapv(|v:f32| v * variances[0]);
    loc_from = loc_from.mapv(|v:f32| (v * variances[1]).exp());
    
    let a =
        priors_to + loc_to * priors_from.clone();
    let b =  
        priors_from * loc_from;
    let mut boxes = stack![Axis(1), a, b];

    let boxes_to = boxes.clone().slice_move(s![.., ..2]);
    let mut boxes_from = boxes.clone().slice_move(s![.., 2..]);
    boxes_from = boxes_from.mapv(|v: f32| v / 2.0);

    for i in 0..boxes.shape()[0] {
        for j in 0..2 {
            boxes[[i, j]] -= boxes_from[[i, j]];
        }
    }

    for i in 0..boxes.shape()[0] {
        for j in 0..2 {
            boxes[[i, j]] += boxes_to[[i, j]];
        }
    }

    Ok(boxes)
}

Also is it possible to stack multiple tensors like:

    landms = torch.cat((priors[:, :2] + pre[:, :2] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 2:4] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 4:6] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 6:8] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 8:10] * variances[0] * priors[:, 2:],
                        ), dim=1)

Or i have to go 2 by 2 and produce same results?

xd009642 · 2021-03-11T12:31:02Z

xd009642
Mar 11, 2021

So it looks like slicing is what you're looking for. I just knocked up a quick example of adding the elements in a subview of a matrix.

use ndarray::{arr2, s};

fn main() {
    
    let mut a = arr2(&[[ 1,  2,  3],    
                    [ 4,  5,  6],   
                    [ 7,  8,  9],    
                    [10, 11, 12]]);
    println!("{}", a);
    
    let mut view = a.slice_mut(s![.., 0..2]);
    view.mapv_inplace(|x| x + x);
    println!("{}", a);
}

That's for your example where you add boxes and boxes together. For another one you can do the equivalent slice and then I believe jsomething like boxes.slice_mut(s![0, ..]) += &boxesto.slice(s![0,..]); , though I haven't knocked up an example to try it

0 replies

vladimirmujagic · 2021-03-11T12:40:06Z

vladimirmujagic
Mar 11, 2021
Author

I see i was trying mapv_inplace but couldn't make it to work probably because of my limited rust knowledge.

Thank you very much i will try it out.

0 replies

jturner314 · 2021-03-12T05:36:28Z

jturner314
Mar 12, 2021
Maintainer

These are a couple of implementations for decode. Please test them before using. I've checked that they compile, but not that they're correct.

use ndarray::concatenate;
use ndarray::prelude::*;

pub fn decode(
    loc: ArrayView2<'_, f32>,
    priors: ArrayView2<'_, f32>,
    variances: &[f32],
) -> Array2<f32> {
    let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
    let (loc_to, loc_from) = loc.view().split_at(Axis(1), 2);
    let mut boxes_to = variances[0] * &loc_to * &priors_from + &priors_to;
    let mut boxes_from = loc_from.mapv(|x| (x * variances[1]).exp()) * &priors_from;
    boxes_to -= &(&boxes_from / 2.);
    boxes_from += &boxes_to;
    concatenate!(Axis(1), boxes_to, boxes_from)
}

pub fn decode2(
    loc: ArrayView2<'_, f32>,
    priors: ArrayView2<'_, f32>,
    variances: &[f32],
) -> Array2<f32> {
    let mut boxes = Array2::zeros(priors.raw_dim());
    let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
    let (loc_to, loc_from) = loc.view().split_at(Axis(1), 2);
    let (boxes_to, boxes_from) = boxes.view_mut().split_at(Axis(1), 2);
    azip!(
        (
            loc_to in loc_to, priors_to in priors_to, boxes_to in boxes_to,
            loc_from in loc_from, priors_from in priors_from, boxes_from in boxes_from,
        )
        {
            *boxes_to = priors_to + variances[0] * loc_to * priors_from;
            *boxes_from = priors_from * (loc_from * variances[1]).exp();
            *boxes_to -= *boxes_from / 2.;
            *boxes_from += *boxes_to;
        }
    );
    boxes
}

You could use slicing instead of .view().split_at() if you find it easier to read. Personally, I find decode2 a little easier to read than decode1. Additionally, it has only a single allocation for the output array, compared to four allocations in decode1. I'd also consider changing the type of variances to [f32; 2] if you know that it's always going to have exactly two elements.

For the second question, you could use slicing and concatenate, although the code does seem somewhat repetitive and creates a lot of temporary allocations. Something like this is a bit nicer, IMO:

use itertools::izip;
use ndarray::prelude::*;

pub fn second_example(
    pre: ArrayView2<'_, f32>,
    priors: ArrayView2<'_, f32>,
    variances: &[f32],
) -> Array2<f32> {
    let mut landms = Array2::zeros(pre.raw_dim());
    let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
    for (landms_chunk, pre_chunk) in izip!(
        landms.axis_chunks_iter_mut(Axis(1), 2),
        pre.axis_chunks_iter(Axis(1), 2)
    ) {
        azip!(
            (
                priors_to in priors_to, priors_from in priors_from,
                landms_chunk in landms_chunk, pre_chunk in pre_chunk
            )
            *landms_chunk = priors_to + pre_chunk * variances[0] * priors_from
        );
    }
    landms
}

Edit: Or, you could do this, which is more concise but creates a temporary allocation in each iteration of the for loop:

use itertools::izip;
use ndarray::prelude::*;

pub fn second_example2(
    pre: ArrayView2<'_, f32>,
    priors: ArrayView2<'_, f32>,
    variances: &[f32],
) -> Array2<f32> {
    let mut landms = Array2::zeros(pre.raw_dim());
    let (priors_to, priors_from) = priors.view().split_at(Axis(1), 2);
    for (mut landms_chunk, pre_chunk) in izip!(
        landms.axis_chunks_iter_mut(Axis(1), 2),
        pre.axis_chunks_iter(Axis(1), 2)
    ) {
        landms_chunk.assign(&(variances[0] * &pre_chunk * &priors_from + &priors_to));
    }
    landms
}

I wonder if you'd be happier using arrays with more axes. For example, I wonder if it would make sense for your use-case for pre and landms to be three-dimensional rather than concatenating things along axis 1.

0 replies

vladimirmujagic · 2021-03-12T07:13:45Z

vladimirmujagic
Mar 12, 2021
Author

Wow... thank you this explains a lot and has great educational value for me. I will test and try to understand these approaches and get back to you. However final test probably wont be possible until i translate all the code. Not easy to compare with python implementation i have to hardcode values + torch is not using fixed input size like onnxruntime i am using in rust.

I am pretty close to glue everything up, got a few more numpy postprocessing lines to go.

Currently here: https://github.com/biubug6/Pytorch_Retinaface/blob/master/detect.py#L121

So got two more functions to resolve np.where, np.argsort I guess those are not implemented directly yet, how do you workaround them?

0 replies

jturner314 · 2021-03-12T22:03:47Z

jturner314
Mar 12, 2021
Maintainer

So, you want to implement the following lines?

        # ignore low scores
        inds = np.where(scores > args.confidence_threshold)[0]
        boxes = boxes[inds]
        landms = landms[inds]
        scores = scores[inds]

        # keep top-K before NMS
        order = scores.argsort()[::-1][:args.top_k]
        boxes = boxes[order]
        landms = landms[order]
        scores = scores[order]

What are the shapes of scores, boxes, and landms?

0 replies

vladimirmujagic · 2021-03-13T10:08:19Z

vladimirmujagic
Mar 13, 2021
Author

Yes, those are the remaining lines to finish the algorithm + a few more but those shouldn't be a problem pretty straightforward slicing operations.

Regarding the shapes, this is output of original implementation:

boxes (26240, 4)
landms (26240, 10)
scores (26240,)

So basically they are 2D, 2D, 1D, numbers 4 and 10 are fixed and independent of network input size. Number 26240 depends on the input size, in my case i am using 640x640 onnx model so I have 16800.

0 replies

vladimirmujagic · 2021-03-13T13:47:31Z

vladimirmujagic
Mar 13, 2021
Author

Pushed current code including your implementations: https://github.com/vladimirmujagic/rust-retinaface

0 replies

jturner314 · 2021-03-13T23:10:09Z

jturner314
Mar 13, 2021
Maintainer

Here are a couple of ways to do it:

pub fn select_high_scores(
    boxes: ArrayView2<'_, f32>,
    landms: ArrayView2<'_, f32>,
    scores: ArrayView1<'_, f32>,
    confidence_threshold: f32,
    top_k: usize,
) -> (Array2<f32>, Array2<f32>, Array1<f32>) {
    let mut ind_above_thresh: Vec<usize> = scores
        .iter()
        .enumerate()
        .filter_map(|(ind, &score)| (score > confidence_threshold).then(|| ind))
        .collect();
    ind_above_thresh.sort_unstable_by(|&ind1, &ind2| {
        // Note the swapped order of `ind1` and `ind2` for sorting in descending order.
        scores[ind2]
            .partial_cmp(&scores[ind1])
            .expect("Score must not be NaN.")
    });
    let top_k_ind = &ind_above_thresh[..top_k];
    (
        boxes.select(Axis(0), top_k_ind),
        landms.select(Axis(0), top_k_ind),
        scores.select(Axis(0), top_k_ind),
    )
}

pub fn select_high_scores2(
    boxes: ArrayView2<'_, f32>,
    landms: ArrayView2<'_, f32>,
    scores: ArrayView1<'_, f32>,
    confidence_threshold: f32,
    top_k: usize,
) -> (Array2<f32>, Array2<f32>, Array1<f32>) {
    let mut above_thresh: Vec<(usize, f32)> = scores
        .iter()
        .copied()
        .enumerate()
        .filter(|&(_ind, score)| score > confidence_threshold)
        .collect();
    above_thresh.sort_unstable_by(|(_, score1), (_, score2)| {
        // Note the swapped order of `score1` and `score2` for sorting in descending order.
        score2.partial_cmp(score1).expect("Score must not be NaN.")
    });
    let top_k_ind: Vec<usize> = above_thresh
        .iter()
        .map(|&(ind, _)| ind)
        .take(top_k)
        .collect();
    (
        boxes.select(Axis(0), &top_k_ind),
        landms.select(Axis(0), &top_k_ind),
        scores.select(Axis(0), &top_k_ind),
    )
}

I'm not sure which will be faster. The second approach has an extra allocation, but the sorting should be faster because it doesn't have to randomly index into the scores array. Note also that if you don't need top_k_ind to be sorted, then you could use select_nth_unstable_by instead of sort_unstable_by to avoid a full sort, although the unordered indices may slow down the final .select() calls. If the initial filter + collect is slow, it might be a little faster to call push repeatedly in for_each; I'm not sure.

0 replies

vladimirmujagic · 2021-03-14T11:31:12Z

vladimirmujagic
Mar 14, 2021
Author

Added your code to current flow and tested for correctness by comparing original python implementation and its outputs to rust implementation.

Here is the current postprocessing flow:

    // Postprocessing
    // ------------------------------------------------------------------------------------------------------------------------------
    let loc = &outputs[0];
    println!("loc: {:?}", loc);
    let conf = &outputs[1];
    let landms = &outputs[2];

    let confidence_threshold = 0.02;
    let top_k = 5000;
    let nms_threshold = 0.4;
    let keep_top_k = 750;
    let vis_threhsold = 0.6;

    let scale = array![image_width as f32, image_height as f32, image_width as f32, image_height as f32];
    let resize = 1.0;

    let priorbox = PriorBox::new((target_width, target_height));
    let prior_data = priorbox.forward();
    let variances = priorbox.cfg().variances();

    let loc_squeezed = loc.slice(s![0, .., ..]).to_owned();
    let mut boxes = decode_boxes(loc_squeezed.view(), prior_data.view(), variances);
    boxes = boxes * scale;
    boxes.mapv_inplace(|v| v / resize);

    let conf_squeezed: Array2<f32> = conf.slice(s![0, .., ..]).to_owned();
    let scores = conf_squeezed.column(1);

    let landms_squeezed = landms.slice(s![0, .., ..]).to_owned();
    let mut landmarks = decode_landm(landms_squeezed.view(), prior_data.view(), variances);
    let scale1 = array![
        image_width as f32, image_height as f32,
        image_width as f32, image_height as f32,
        image_width as f32, image_height as f32,
        image_width as f32, image_height as f32,
        image_width as f32, image_height as f32
    ];
    landmarks = landmarks * scale1;
    landmarks.mapv_inplace(|v| v / resize);

    let (s_boxes, s_landms, s_scores) =
        select_high_scores(boxes.view(), landmarks.view(), scores, confidence_threshold, top_k);
    // ------------------------ ------------------------------------------------------------------------------------------------------

Code is compiling but produces unexpected error for given thresholds:

    let confidence_threshold = 0.02;
    let top_k = 5000;

Error:

thread '<unnamed>' panicked at 'range end index 5000 out of range for slice of length 277', src/postprocessing.rs:195:22

So this implies if i am not mistaken that filtering operation:

   let mut ind_above_thresh: Vec<usize> = scores
        .iter()
        .enumerate()
        .filter_map(|(ind, &score)| (score > confidence_threshold).then(|| ind))
        .collect();

discards too many values which further implies that network outputs might be wrong (delta is too big), this is still not confirmed and shouldn't be the case even though original implementation is using pytorch and in rust implementation onnxruntime is used because i am passing exactly the same image as network input and there were no issues when switching between inference engines in python.

Still trying to understand if this can somehow be related to incorrect tensor manipulations

0 replies

jturner314 · 2021-03-14T21:30:23Z

jturner314
Mar 14, 2021
Maintainer

Yeah, if that panic occurs on the line let top_k_ind = &ind_above_thresh[..top_k];, then ind_above_thresh has length 277, which means that 277 scores are above confidence_threshold. It sounds like this is less than what you were expecting. The bug could be in the code I suggested or somewhere else. I'd suggest testing smaller pieces of the code separately to narrow down where the problem is located.

0 replies

bluss · 2021-03-15T18:34:01Z

bluss
Mar 15, 2021
Maintainer

The course of this issue is just a discussion I guess? I could move it to discussions, and using discussions is also a good idea (but I realize they might be less visible).

0 replies

jturner314 · 2021-03-15T19:05:36Z

jturner314
Mar 15, 2021
Maintainer

Yeah, it's just a discussion. I didn't realize that it was possible to convert issues to discussions. I'll try it. Edit: That's pretty cool. Thanks for pointing out that feature.

2 replies

bluss Mar 15, 2021
Maintainer

github has been tweaking the discussion feature a bit since launch. I don't know if the lock topic was yours or due to moving to discussions, but it shouldn't be needed to do that. Issues I've moved before have practically "disappeared" instead of having a lock, at least..

jturner314 Mar 15, 2021
Maintainer

Yeah, GitHub closed and locked the issue as part of converting it into a discussion, I guess to force the conversation to move to the discussion. I'm surprised that there's no way to unlock the issue.

bluss · 2021-03-15T19:24:01Z

bluss
Mar 15, 2021
Maintainer

I see that you wrote ArrayBase<OwnedRepr<f32>, Dim<[usize; 2]>> in your code. There are type aliases for the various combinations, and it's not intended to ever be written out in full like that. Please use one of the equivalent ways to write the same type: Array<f32, Ix2> or Array2<f32>. (and array views work similarly with the ArrayView and ArrayViewMut aliases).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non trivial slice assignments and tensor manipulation #941

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 13 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Non trivial slice assignments and tensor manipulation #941

vladimirmujagic Mar 11, 2021

Replies: 13 comments · 2 replies

xd009642 Mar 11, 2021

vladimirmujagic Mar 11, 2021 Author

jturner314 Mar 12, 2021 Maintainer

vladimirmujagic Mar 12, 2021 Author

jturner314 Mar 12, 2021 Maintainer

vladimirmujagic Mar 13, 2021 Author

vladimirmujagic Mar 13, 2021 Author

jturner314 Mar 13, 2021 Maintainer

vladimirmujagic Mar 14, 2021 Author

jturner314 Mar 14, 2021 Maintainer

bluss Mar 15, 2021 Maintainer

jturner314 Mar 15, 2021 Maintainer

bluss Mar 15, 2021 Maintainer

jturner314 Mar 15, 2021 Maintainer

bluss Mar 15, 2021 Maintainer

vladimirmujagic
Mar 11, 2021

Replies: 13 comments 2 replies

xd009642
Mar 11, 2021

vladimirmujagic
Mar 11, 2021
Author

jturner314
Mar 12, 2021
Maintainer

vladimirmujagic
Mar 12, 2021
Author

jturner314
Mar 12, 2021
Maintainer

vladimirmujagic
Mar 13, 2021
Author

vladimirmujagic
Mar 13, 2021
Author

jturner314
Mar 13, 2021
Maintainer

vladimirmujagic
Mar 14, 2021
Author

jturner314
Mar 14, 2021
Maintainer

bluss
Mar 15, 2021
Maintainer

jturner314
Mar 15, 2021
Maintainer

bluss Mar 15, 2021
Maintainer

jturner314 Mar 15, 2021
Maintainer

bluss
Mar 15, 2021
Maintainer