Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Search is not working effectively #69

Open
seungjuuuuuu opened this issue Jan 18, 2024 · 9 comments
Open

Active Search is not working effectively #69

seungjuuuuuu opened this issue Jan 18, 2024 · 9 comments

Comments

@seungjuuuuuu
Copy link

When I got the KingsCollege 3D scene file in Cambridge Landmarks, I experimented with the steps described in redeme.txt in ACG-Localizer, but the acg_localizer_active_search output had a large rotation and position error and showed that 0 images were registered. I sincerely ask why is that?

@tsattler
Copy link
Owner

Because the original code is computing the pose together with the intrinsics. You get much more accurate and stable poses when using a calibrated camera.

@seungjuuuuuu
Copy link
Author

Thank you for your prompt response. Regarding your suggestion to use a calibrated camera, could you please clarify at which stage this is recommended? Is it typically provided during scene modeling in Colmap? My research focuses on testing methodologies using datasets like Cambridge Landmarks and 7Scenes. However, based on my understanding, these datasets do not seem to include camera intrinsic parameters. If I intend to proceed with testing the Active Search method on these datasets, what steps should I take?

@seungjuuuuuu
Copy link
Author

Because the original code is computing the pose together with the intrinsics. You get much more accurate and stable poses when using a calibrated camera.

Sorry to bother you again, but I still can't resolve the above issue

In the past few days, I used Colmap to convert .bin format files to bundler's output format, and the list.txt file has the following format, for example:
./seq1_frame00016.jpg 0 1665.30750
./seq1_frame00015.jpg 0 1665.65062
./seq1_frame00012.jpg 0 1666.12875
I wonder if what you said about using a calibrated camera is reflected in this step.

Then I followed the steps prompted by ACG-Localizer.To ensure that a calibrated camera is used, I even generated a list.txt with the following format for the query image:
./frame00001.jpg 0 1676.75098
./frame00002.jpg 0 1676.77588

Although the above steps were executed successfully, it still shows that 0 images have been registered. I even looked at the code of acg_localizer_active_search to check if the function can provide camera intrinsics. But none of the above solutions solved my problem. So I'm asking for your help again and I'd be grateful if you could reply!

@tsattler
Copy link
Owner

You can get Colmap models for Cambridge and 7Scenes here: https://github.com/cvg/Hierarchical-Localization/tree/master/hloc/pipelines/Cambridge and here https://github.com/cvg/Hierarchical-Localization/tree/master/hloc/pipelines/7Scenes . The intrinsics of the queries should be included there as well.

The original bundler file specification assumes that the coordinate system of each image follows the Computer Graphics convention (x-axis points to the right, y-axis points upwards, camera is looking down the -z-axis). The format generated by Colmap follows the Computer Vision convention (x-axis points to the right, y-axis points downwards, camera is looking down the -z-axis). I'd assume that this causes problems. A description on how to convert between the formats can be found here: https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/README_Aachen-Day-Night.md

I don't remember whether the ACM Localizer code supports calibrated cameras or not. I wrote this more than 11 years ago.

@seungjuuuuuu
Copy link
Author

Thank you very much for your reply! I will follow your suggestion for further testing.

Additionally, may I ask if you have the test results for Active Search on each query image for every scene in Cambridge Landmarks and 7Scenes? My experiments are specifically aimed at obtaining these test results, rather than focusing on the median error for each scene. If you could share them with me, I would be extremely grateful.

@tsattler
Copy link
Owner

Poses per image for 7Scenes and 12Scenes can be found in this repository: https://github.com/tsattler/visloc_pseudo_gt_limitations

Poses for Cambridge Landmarks are here: https://drive.google.com/file/d/1xY459_o7XFLAtrhK_i8Kqbn9UZS50pKc/view?usp=sharing
The format in which the poses are given should follow the format of Colmap (qw qx qy qz tx ty tz). I am not sure whether these are exactly the same poses used to compute the statistics in the PixLoc paper, but they should be comparable.

@seungjuuuuuu
Copy link
Author

When I tried to test the performance of each image, I found that it had a large position error.

For example, seq2/frame00002 in active_search_1_1_markt_paris_10k has the following pose information:
[qw qx qy qz tx ty tz]0.65112 0.619581 0.303496 -0.31631 -12.3897 5.00842 73.4588‘

The gt released by the original author has the following pose information:
[X Y Z W P Q R] 65.474678 -35.436835 1.599990 0.650746 0.620279 0.302991 -0.316196

When I solve for the position error between [-12.3897, 5.00842, 73.4588] and [65.474678, -35.436835, 1.599990], I calculate 113.5369. Does this seem to be a problem?

@tsattler
Copy link
Owner

tx ty tz define the translation, not the position. X Y Z specify the position, not the translation. You cannot directly compare the numbers. You get the estimated position of Active Search as -R^T * t, where t = [tx, ty, tz]^T is the translation vector stored in the file and R is the rotation matrix defined by (qw, qx, qy, qz) (R^T is the transpose (or inverse) of that matrix).

@seungjuuuuuu
Copy link
Author

tx ty tz define the translation, not the position. X Y Z specify the position, not the translation. You cannot directly compare the numbers. You get the estimated position of Active Search as -R^T * t, where t = [tx, ty, tz]^T is the translation vector stored in the file and R is the rotation matrix defined by (qw, qx, qy, qz) (R^T is the transpose (or inverse) of that matrix).

Thank you so much for your constant replies and help to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants