-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre processing of input data #4
Comments
Hello, Thank you for your interest. You can also look at these lines in the test.py file : https://github.com/face-analysis/emonet/blob/master/test.py#L35#L51 Hope this helps |
Hello @Girish-03 |
Hello, One issue I can think of is the fact that OpenCV loads image in BGR format, whereas our network was trained using the RGB format (we load images using skimage - see the affectnet dataloader, get_item function : https://github.com/face-analysis/emonet/blob/master/emonet/data/affecnet.py#L120). Maybe this is the issue... Hope this helps! |
Hello |
I'm also having an issue validating the network's prediction on stock images I've tried many variations including flipping the channels from RGB to BGR normalized the input array:
always resize image to 256,256 non of the above variations worked and the network still predicts the wrong emotion and valiance and arousal in the code in the repository there is no code that does input normalization, only resize transform could you please point me to the correct data preparation steps? Thanks |
I got result as it is. |
nice! |
goodgood |
Hi,
The work is really amazing and results seems to be astonishing.
I am a student and trying to use this code for one of my research project. I would like to know if there is a specific pre processing technique to be used before feeding in the images to the network.
For instance, I am detecting the faces in video frames using OpenCV Caffe model DNN face detector, cropping it, resizing it to 256x256 and feeding to the network. But, the valence and arousal values along with categorical emotion I am getting is not matching for many frames. I am assuming I might not be doing some preprocessing of input frames as required by Emonet model. Also, if there is any specific technique to be used for detecting and cropping the faces. Therefore, requesting your guidance here.
I performed the estimation and visualization on the same video provided in the paper to compare to your results, but its not the same. Below is the link to the video with original results (Valence arousal bars and categorical emotions) and results from my pre processing (as explained above).
(The Green vertical and blue horizontal bars with emotion in red text are my results.)
Using 5 class model
https://drive.google.com/file/d/1--GW_J3XUDNbo59YOTbLJ-VPWS4-2oey/view?usp=sharing
Using 8 class model
https://drive.google.com/file/d/1jJ9Ah7rcoN3aVkLYPq8cDajdRTnwsamU/view?usp=sharing
The text was updated successfully, but these errors were encountered: