-
Notifications
You must be signed in to change notification settings - Fork 0
/
Report-1030890.out
23374 lines (19447 loc) · 903 KB
/
Report-1030890.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---------------------------------------
Begin Slurm Prolog: Dec-08-2024 17:48:22
Job ID: 1030890
User ID: yxu846
Account: scs
Job name: visagent
Partition: ice-gpu
---------------------------------------
2024-12-08 17:48:35.569074: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-08 17:48:35.583209: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-08 17:48:35.599341: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-08 17:48:35.604084: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-08 17:48:35.616352: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-08 17:48:38.061357: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
A new version of the following files was downloaded from https://huggingface.co/microsoft/Florence-2-large:
- processing_florence2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/microsoft/Florence-2-large:
- configuration_florence2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/microsoft/Florence-2-large:
- modeling_florence2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
/home/hice1/yxu846/.conda/envs/py39/lib/python3.9/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
Registering LOC step
Registering COUNT step
Registering CROP step
Registering CROP_RIGHTOF step
Registering CROP_LEFTOF step
Registering CROP_FRONTOF step
Registering CROP_INFRONTOF step
Registering CROP_INFRONT step
Registering CROP_BEHIND step
Registering CROP_AHEAD step
Registering CROP_BELOW step
Registering CROP_ABOVE step
Registering VQA step
Registering EVAL step
Registering RESULT step
Registering CAP step
Registering RETRIEVE step
Registering RELATIVE_POS step
Registering MERGE step
0%| | 0/1962 [00:00<?, ?it/s]/home/hice1/yxu846/.conda/envs/py39/lib/python3.9/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
warnings.warn(
0%| | 1/1962 [00:10<5:47:15, 10.62s/it] 0%| | 2/1962 [00:13<3:22:13, 6.19s/it] 0%| | 3/1962 [00:16<2:28:29, 4.55s/it] 0%| | 4/1962 [00:17<1:51:34, 3.42s/it] 0%| | 5/1962 [00:19<1:31:44, 2.81s/it] 0%| | 6/1962 [00:22<1:30:42, 2.78s/it] 0%| | 7/1962 [00:25<1:30:16, 2.77s/it] 0%| | 8/1962 [00:27<1:28:33, 2.72s/it] 0%| | 9/1962 [00:31<1:33:59, 2.89s/it] 1%| | 10/1962 [00:33<1:32:17, 2.84s/it] 1%| | 11/1962 [00:35<1:20:38, 2.48s/it] 1%| | 12/1962 [00:37<1:13:27, 2.26s/it] 1%| | 13/1962 [00:39<1:17:57, 2.40s/it] 1%| | 14/1962 [00:42<1:14:35, 2.30s/it] 1%| | 15/1962 [00:44<1:16:08, 2.35s/it] 1%| | 16/1962 [00:47<1:24:54, 2.62s/it] 1%| | 17/1962 [00:50<1:24:38, 2.61s/it] 1%| | 18/1962 [00:51<1:15:36, 2.33s/it] 1%| | 19/1962 [00:55<1:24:54, 2.62s/it] 1%| | 20/1962 [00:57<1:25:44, 2.Does the plate have a different color than the artwork?
reference answer: Yes, the artwork is black and the plate is white.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'plate\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'artwork\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the plate?\')\nANSWER1=VQA(image=IMAGE1,question=\'What color is the artwork?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} != {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'no'}}
no
What kind of device is made of the same material as the device above the desk?
reference answer: The computer monitor is made of the same material as the keyboard.
LOC
CROP
LOC
CROP_ABOVE
VQA
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='desk')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object='device')\nIMAGE1=CROP_ABOVE(image=IMAGE0,box=BOX1)\nANSWER0=VQA(image=IMAGE1,question='What material is the device made of?')\nANSWER1=VQA(image=IMAGE0,question='What kind of device is it?')\nFINAL_RESULT=RESULT(var=ANSWER1)", 'answer': 'computer'}}
computer
Is there a teddy bear that is not lying?
reference answer: No, there is a teddy bear but it is lying.
LOC
CROP
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'teddy bear\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question=\'Is the teddy bear lying?\')\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} == \'no\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
Which company are the shorts from?
reference answer: The shorts are from adidas.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Which company are the shorts from?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'adidas'}}
adidas
Which company are the shoes from?
reference answer: The shoes are from adidas.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Which company are the shoes from?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'adidas'}}
adidas
Is this a table or a bed?
reference answer: This is a table.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'table\')\nBOX1=LOC(image=IMAGE,object=\'bed\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'table\' if {ANSWER0} > 0 else \'bed\' if {ANSWER1} > 0 else \'neither\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'table'}}
table
Do the cabinets hang from the beige wall?
reference answer: Yes, the cabinets hang from the wall.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'beige wall\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'cabinets\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
Is the knife on a napkin?
reference answer: No, the knife is on the table.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'napkin\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'knife\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
Does the necktie have a different color than the shoe?
reference answer: No, both the necktie and the shoe are black.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'necktie\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'shoe\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the necktie?\')\nANSWER1=VQA(image=IMAGE1,question=\'What color is the shoe?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} != {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Is the ceiling above a toilet?
reference answer: Yes, the ceiling is above a toilet.
LOC
CROP_BELOW
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'toilet\')\nIMAGE0=CROP_BELOW(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'ceiling\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
Is this the zoo?
reference answer: No, it is the garden.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Is this the zoo?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'no'}}
no
Which place is it?
reference answer: It is a street.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Which place is it?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'restaurant'}}
restaurant
Are there cyclists or men that are riding?
reference answer: Yes, the cyclist is riding.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'cyclist\')\nBOX1=LOC(image=IMAGE,object=\'man\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} + {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
What is the weather like today?
reference answer: It is clear.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='What is the weather like today?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'cloudy'}}
cloudy
Is there a fence in front of the trees?
reference answer: No, there is a boy in front of the trees.
LOC
CROP_FRONT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'trees\')\nIMAGE0=CROP_FRONT(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'fence\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': "Runtime error: 'CROP_FRONT'"}}
Runtime error: 'CROP_FRONT'
Are there any ovens or cabinets that are black?
reference answer: Yes, the oven is black.
LOC
LOC
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'oven\')\nBOX1=LOC(image=IMAGE,object=\'cabinet\')\nANSWER0=VQA(image=IMAGE,question=\'What color is the oven?\')\nANSWER1=VQA(image=IMAGE,question=\'What color is the cabinet?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == \'black\' or {ANSWER1} == \'black\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Is the cake to the left or to the right of the bowl?
reference answer: The cake is to the left of the bowl.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'bowl\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'cake\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'left\' if {ANSWER0} > 0 else \'right\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'right'}}
right
Are the people of the same gender?
reference answer: Yes, all the people are female.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Are the people of the same gender?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'yes'}}
yes
What type of device is made of the same material as the remote the man is to the left of?
reference answer: The telephone is made of the same material as the remote.
LOC
CROP_LEFTOF
LOC
CROP
VQA
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='man')\nIMAGE0=CROP_LEFTOF(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object='remote')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE1,question='What material is the remote made of?')\nANSWER1=VQA(image=IMAGE,question='What type of device is made of {ANSWER0} material?')\nFINAL_RESULT=RESULT(var=ANSWER1)", 'answer': 'remote'}}
remote
Do you see any large couch or desk?
reference answer: No, there is a desk but it is small.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'couch\')\nBOX1=LOC(image=IMAGE,object=\'desk\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} > 0 or {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Do both the people have the same gender?
reference answer: No, they are both male and female.
LOC
CROP
VQA
VQA
EVAL
RESULT
65s/it] 1%| | 21/1962 [01:00<1:26:20, 2.67s/it] 1%| | 22/1962 [01:03<1:28:44, 2.74s/it] 1%| | 23/1962 [01:06<1:26:51, 2.69s/it] 1%| | 24/1962 [01:09<1:33:05, 2.88s/it] 1%|â | 25/1962 [01:11<1:28:56, 2.76s/it] 1%|â | 26/1962 [01:14<1:24:28, 2.62s/it] 1%|â | 27/1962 [01:16<1:17:27, 2.40s/it] 1%|â | 28/1962 [01:18<1:12:42, 2.26s/it] 1%|â | 29/1962 [01:20<1:14:24, 2.31s/it] 2%|â | 30/1962 [01:22<1:11:02, 2.21s/it] 2%|â | 31/1962 [01:25<1:15:55, 2.36s/it] 2%|â | 32/1962 [01:28<1:25:39, 2.66s/it] 2%|â | 33/1962 [01:31<1:28:16, 2.75s/it] 2%|â | 34/1962 [01:35<1:35:47, 2.98s/it] 2%|â | 35/1962 [01:36<1:24:15, 2.62s/it] 2%|â | 36/1962 [01:39<1:25:56, 2.68s/it] 2%|â | 37/1962 [01:42<1:23:58, 2.62s/it] 2%|â | 38/1962 [01:45<1:30:49, 2.83s/it] 2%|â | 39/1962 [01:47<1:21:06, 2.53s/it{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'person\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question=\'What is the gender of the first person?\')\nANSWER1=VQA(image=IMAGE0,question=\'What is the gender of the second person?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Is there any tea in this photograph that is not wet?
reference answer: No, there is tea but it is wet.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'tea\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'wet\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} == 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
Are the curtains made of cloth?
reference answer: Yes, the curtains are made of cloth.
LOC
CROP
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'curtains\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question=\'What material are the curtains made of?\')\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} == \'cloth\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
Who seems to be older, the woman or the girl?
reference answer: The woman is older than the girl.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'woman\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'girl\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'How old does the woman seem?\')\nANSWER1=VQA(image=IMAGE1,question=\'How old does the girl seem?\')\nANSWER2=EVAL(expr="\'woman\' if {ANSWER0} > {ANSWER1} else \'girl\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'woman'}}
woman
What kind of furniture is cracked?
reference answer: The furniture is a table.
LOC
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='cracked')\nANSWER0=VQA(image=IMAGE,question='What kind of furniture is cracked?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'table'}}
table
Where is the girl that is young looking at?
reference answer: The girl is looking up.
LOC
CROP
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='girl')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='Where is the girl looking?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'up'}}
up
Are the animals of different species?
reference answer: Yes, they are dogs and birds.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Are the animals of different species?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'yes'}}
yes
How is the weather, clear or stormy?
reference answer: It is stormy.
VQA
EVAL
RESULT
{'agent': {'program': 'ANSWER0=VQA(image=IMAGE,question=\'How is the weather?\')\nANSWER1=EVAL(expr="\'clear\' if {ANSWER0} == \'clear\' else \'stormy\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'stormy'}}
stormy
Do you see birds there that are not sitting?
reference answer: Yes, there is a bird that is standing .
LOC
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'birds\')\nANSWER0=VQA(image=IMAGE,question=\'Are the birds sitting?\')\nANSWER1=EVAL(expr="\'no\' if {ANSWER0} == \'yes\' else \'yes\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
Is this a white cabinet?
reference answer: No, this is a brown cabinet.
VQA
EVAL
RESULT
{'agent': {'program': 'ANSWER0=VQA(image=IMAGE,question=\'What color is the cabinet?\')\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} == \'white\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
On which side of the image is the male person?
reference answer: The skateboarder is on the left of the image.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'male person\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'RIGHT\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'right\' if {ANSWER0} > 0 else \'left\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'right'}}
right
Does the shirt have the same color as the chair?
reference answer: Yes, both the shirt and the chair are white.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'shirt\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'chair\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the shirt?\')\nANSWER1=VQA(image=IMAGE1,question=\'What color is the chair?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'no'}}
no
Are there both a door and a window in this scene?
reference answer: Yes, there are both a window and a door.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'door\')\nBOX1=LOC(image=IMAGE,object=\'window\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} > 0 and {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'no'}}
no
Do the sky and the flag have the same color?
reference answer: Yes, both the sky and the flag are blue.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'sky\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'flag\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the sky?\')\nANSWER1=VQA(image=IMAGE1,question=\'What color is the flag?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'no'}}
no
How is the weather?
reference answer: It is partly cloudy.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='How is the weather?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'cloudy'}}
cloudy
Are there both a chair and a cup in the photo?
reference answer: Yes, there are both a cup and a chair.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'chair\')\nBOX1=LOC(image=IMAGE,object=\'cup\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} > 0 and {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Is the bread on the left side?
reference answer: No, the bread is on the right of the image.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'LEFT\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'bread\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
Are there any chairs or mugs that are black?
reference answer: Yes, the mug is black.
LOC
LOC
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'chair\')\nBOX1=LOC(image=IMAGE,object=\'mug\')\nANSWER0=VQA(image=IMAGE,question=\'What color is the chair?\')\nANSWER1=VQA(image=IMAGE,question=\'What color is the mug?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == \'black\' or {ANSWER1} == \'black\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
What is the woman doing?
reference answer: The woman is looking down.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='What is the woman doing?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'texting'}}
texting
Do you see any short grass?
reference answer: Yes, there is short grass.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Do you see any short grass?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'yes'}}
yes
Is the boat in the bottom part or in the top of the picture?
reference answer: The boat is in the bottom of the image.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'BOTTOM\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'boat\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'bottom\' if {ANSWER0} > 0 else \'top\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'bottom'}}
bottom
Are there buckets or sugar packets?
] 2%|â | 40/1962 [01:49<1:14:14, 2.32s/it] 2%|â | 41/1962 [01:51<1:16:15, 2.38s/it] 2%|â | 42/1962 [01:54<1:22:00, 2.56s/it] 2%|â | 43/1962 [01:57<1:25:41, 2.68s/it] 2%|â | 44/1962 [01:59<1:21:00, 2.53s/it] 2%|â | 45/1962 [02:02<1:20:19, 2.51s/it] 2%|â | 46/1962 [02:05<1:27:48, 2.75s/it] 2%|â | 47/1962 [02:08<1:29:18, 2.80s/it] 2%|â | 48/1962 [02:10<1:21:44, 2.56s/it] 2%|â | 49/1962 [02:13<1:29:04, 2.79s/it] 3%|â | 50/1962 [02:15<1:19:17, 2.49s/it] 3%|â | 51/1962 [02:17<1:18:32, 2.47s/it] 3%|â | 52/1962 [02:20<1:15:00, 2.36s/it] 3%|â | 53/1962 [02:23<1:23:14, 2.62s/it] 3%|â | 54/1962 [02:25<1:20:21, 2.53s/it] 3%|â | 55/1962 [02:27<1:15:12, 2.37s/it] 3%|â | 56/1962 [02:29<1:14:37, 2.35s/it] 3%|â | 57/1962 [02:33<1:23:58, 2.64s/it] 3%|â | 58/1962 [02:35<1:15:43, 2.39s/reference answer: No, there are no sugar packets or buckets.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'bucket\')\nBOX1=LOC(image=IMAGE,object=\'sugar packet\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} + {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'no'}}
no
Does the speaker to the left of the other speaker look blue?
reference answer: No, the speaker is silver.
LOC
CROP
LOC
CROP_LEFTOF
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'speaker\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'speaker\')\nIMAGE1=CROP_LEFTOF(image=IMAGE0,box=BOX1)\nANSWER0=VQA(image=IMAGE1,question=\'What color is the speaker?\')\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} == \'blue\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
What's in front of the wall?
reference answer: The tree is in front of the wall.
LOC
CROP_FRONT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='wall')\nIMAGE0=CROP_FRONT(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='What is in front of the wall?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': "Runtime error: 'CROP_FRONT'"}}
Runtime error: 'CROP_FRONT'
Is there an airplane in the picture that is black?
reference answer: No, there is an airplane but it is white.
LOC
CROP
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'airplane\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the airplane?\')\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} == \'black\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
What piece of furniture is the rug in front of, a cabinet or a couch?
reference answer: The rug is in front of a cabinet.
LOC
CROP_FRONT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'rug\')\nIMAGE0=CROP_FRONT(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'cabinet\')\nBOX2=LOC(image=IMAGE0,object=\'couch\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=COUNT(box=BOX2)\nANSWER2=EVAL(expr="\'cabinet\' if {ANSWER0} > 0 else \'couch\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': "Runtime error: 'CROP_FRONT'"}}
Runtime error: 'CROP_FRONT'
Are there either helmets or mirrors that are black?
reference answer: Yes, the helmet is black.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'helmet\')\nBOX1=LOC(image=IMAGE,object=\'mirror\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} > 0 or {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
What is this appliance called?
reference answer: This is a toaster.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='What is this appliance called?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'stove'}}
stove
What is the item of furniture that is made of same material as the storage box that is not empty called?
reference answer: The piece of furniture is a desk.
LOC
CROP
VQA
LOC
CROP
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'storage box\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question=\'What material is the storage box made of?\')\nBOX1=LOC(image=IMAGE,object=\'furniture\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER1=VQA(image=IMAGE1,question=\'What material is the furniture made of?\')\nANSWER2=EVAL(expr="\'furniture\' if {ANSWER0} == {ANSWER1} and {ANSWER0} != \'empty\' else \'none\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'furniture'}}
furniture
How long are the trousers?
reference answer: The trousers are short.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='How long are the trousers?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'short'}}
short
Who is wearing the pants?
reference answer: The woman is wearing pants.
LOC
CROP
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='pants')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='Who is wearing the pants?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'man'}}
man
Are there any new dishwashers?
reference answer: Yes, there is a new dishwasher.
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'dishwasher\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
Is the shape of the log different than the tree?
reference answer: No, both the log and the tree are round.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'log\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'tree\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What shape is the log?\')\nANSWER1=VQA(image=IMAGE1,question=\'What shape is the tree?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} != {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'no'}}
no
What is the white clothing item called?
reference answer: The clothing item is a coat.
LOC
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='white clothing item')\nANSWER0=VQA(image=IMAGE,question='What is the white clothing item called?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'tuxedo'}}
tuxedo
What item of furniture is made of plastic?
reference answer: The piece of furniture is a desk.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='What item of furniture is made of plastic?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'table'}}
table
What device is the same color as the lamp?
reference answer: The cellphone is the same color as the lamp.
LOC
CROP
VQA
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='lamp')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='What color is the lamp?')\nANSWER1=VQA(image=IMAGE,question='What device has the same color as the lamp?')\nFINAL_RESULT=RESULT(var=ANSWER1)", 'answer': 'snowboard'}}
snowboard
What is the vehicle that is made of same material as the pole that is not short called?
reference answer: The vehicle is a van.
LOC
CROP
VQA
LOC
CROP
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'pole\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question=\'What material is the pole made of?\')\nBOX1=LOC(image=IMAGE,object=\'vehicle\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER1=VQA(image=IMAGE1,question=\'What material is the vehicle made of?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == {ANSWER1} and {ANSWER0} != \'short\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Is this a little table?
reference answer: No, this is a large table.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Is this a little table?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'no'}}
no
Is the clock to the right or to the left of the person that the lady is to the left of?
reference answer: The clock is to the right of the person.
LOC
CROP_LEFTOF
LOC
CROP_LEFTOF
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'person\')\nIMAGE0=CROP_LEFTOF(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'lady\')\nIMAGE1=CROP_LEFTOF(image=IMAGE0,box=BOX1)\nBOX2=LOC(image=IMAGE,object=\'clock\')\nANSWER0=COUNT(box=BOX2)\nANSWER1=EVAL(expr="\'right\' if {ANSWER0} > 0 else \'left\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'right'}}
right
Are there both glasses and ties in the image?
reference answer: Yes, there are both a tie and glasses.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'glasses\')\nBOX1=LOC(image=IMAGE,object=\'ties\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} > 0 and {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Are the trousers wet?
reference answer: No, the trousers are dry.
LOC
CROP
VQA
RESULT
it] 3%|â | 59/1962 [02:38<1:22:53, 2.61s/it] 3%|â | 60/1962 [02:41<1:25:46, 2.71s/it] 3%|â | 61/1962 [02:43<1:20:38, 2.55s/it] 3%|â | 62/1962 [02:46<1:25:04, 2.69s/it] 3%|â | 63/1962 [02:48<1:21:26, 2.57s/it] 3%|â | 64/1962 [02:50<1:19:15, 2.51s/it] 3%|â | 65/1962 [02:53<1:15:42, 2.39s/it] 3%|â | 66/1962 [02:56<1:23:37, 2.65s/it] 3%|â | 67/1962 [02:59<1:27:49, 2.78s/it] 3%|â | 68/1962 [03:01<1:24:09, 2.67s/it] 4%|â | 69/1962 [03:04<1:24:32, 2.68s/it] 4%|â | 70/1962 [03:07<1:27:11, 2.77s/it] 4%|â | 71/1962 [03:10<1:27:55, 2.79s/it] 4%|â | 72/1962 [03:12<1:19:30, 2.52s/it] 4%|â | 73/1962 [03:15<1:23:15, 2.64s/it] 4%|â | 74/1962 [03:17<1:19:50, 2.54s/it] 4%|â | 75/1962 [03:20<1:23:16, 2.65s/it] 4%|â | 76/1962 [03:23<1:29:02, 2.83s/it] 4%|â | 77/1962 [03:25<1:20:39, 2.57{'agent': {'program': "BOX0=LOC(image=IMAGE,object='trousers')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='Are the trousers wet?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'no'}}
no
What is the person in front of the chair typing on, a phone or a laptop?
reference answer: The person is typing on a phone.
LOC
CROP_FRONT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'chair\')\nIMAGE0=CROP_FRONT(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'person\')\nIMAGE1=CROP(image=IMAGE0,box=BOX1)\nANSWER0=VQA(image=IMAGE1,question=\'What is the person typing on?\')\nANSWER1=EVAL(expr="\'phone\' if {ANSWER0} == \'phone\' else \'laptop\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': "Runtime error: 'CROP_FRONT'"}}
Runtime error: 'CROP_FRONT'
Is there any pillow that is not red?
reference answer: Yes, there is a white pillow.
LOC
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'pillow\')\nANSWER0=VQA(image=IMAGE,question=\'What color is the pillow?\')\nANSWER1=EVAL(expr="\'yes\' if \'red\' not in {ANSWER0} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
In which part of the photo is the red candle, the bottom or the top?
reference answer: The candle is in the top of the image.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'TOP\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'red candle\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'top\' if {ANSWER0} > 0 else \'bottom\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'top'}}
top
Who in this image is standing?
reference answer: The woman is standing.
LOC
CROP
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='standing')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='Who is standing?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'no one'}}
no one
Are the street lights made of the same material as the cars?
reference answer: Yes, both the street lights and the cars are made of metal.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'cars\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'street lights\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What material are the cars made of?\')\nANSWER1=VQA(image=IMAGE1,question=\'What material are the street lights made of?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Is there any coffee to the left of the man that is to the left of the paintings?
reference answer: No, there is wine to the left of the man.
LOC
CROP_LEFTOF
LOC
CROP_LEFTOF
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'paintings\')\nIMAGE0=CROP_LEFTOF(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'man\')\nIMAGE1=CROP_LEFTOF(image=IMAGE0,box=BOX1)\nBOX2=LOC(image=IMAGE1,object=\'coffee\')\nANSWER0=COUNT(box=BOX2)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
Do you see calculators that are not made of plastic?
reference answer: No, there is a calculator but it is made of plastic.
LOC
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'calculator\')\nANSWER0=VQA(image=IMAGE,question=\'What material is the calculator made of?\')\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} != \'plastic\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'no'}}
no
Is the boy on the right side?
reference answer: No, the boy is on the left of the image.
LOC
CROP
LOC
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'boy\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'RIGHT\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
What is the name of the piece of furniture that is made of the same material as the door?
reference answer: The piece of furniture is a drawer.
LOC
CROP
VQA
LOC
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='door')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='What material is the door made of?')\nBOX1=LOC(image=IMAGE,object=ANSWER0)\nANSWER1=VQA(image=IMAGE,question='What is the name of the piece of furniture?')\nFINAL_RESULT=RESULT(var=ANSWER1)", 'answer': "Runtime error: name 'ANSWER0' is not defined"}}
Runtime error: name 'ANSWER0' is not defined
What kind of furniture is made of the same material as the frame?
reference answer: The end table is made of the same material as the frame.
LOC
CROP
VQA
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='frame')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='What material is the frame made of?')\nANSWER1=VQA(image=IMAGE,question='What kind of furniture is made of {ANSWER0} material?')\nFINAL_RESULT=RESULT(var=ANSWER1)", 'answer': 'wood'}}
wood
What is the rounded appliance?
reference answer: The appliance is a toaster.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='What is the rounded appliance?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'stove'}}
stove
What kind of vegetable is to the right of the giraffe?
reference answer: The vegetable is a carrot.
LOC
CROP_RIGHTOF
LOC
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='giraffe')\nIMAGE0=CROP_RIGHTOF(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object='vegetable')\nANSWER0=VQA(image=IMAGE0,question='What kind of vegetable is it?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'carrot'}}
carrot
Which kind of furniture is the door behind of?
reference answer: The door is behind the chair.
LOC
CROP_BEHIND
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='door')\nIMAGE0=CROP_BEHIND(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='Which kind of furniture is behind the door?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'chairs'}}
chairs
Is this a yellow fire hydrant?
reference answer: Yes, this is a yellow fire hydrant.
LOC
CROP
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'fire hydrant\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the fire hydrant?\')\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} == \'yellow\' else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'yes'}}
yes
Is the color of the building the same as the color of the SUV?
reference answer: No, the building is red and the SUV is black.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'building\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'SUV\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the building?\')\nANSWER1=VQA(image=IMAGE1,question=\'What color is the SUV?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} == {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'no'}}
no
Which kind of furniture is it?
reference answer: The piece of furniture is a bed.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Which kind of furniture is it?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'bed'}}
bed
Is this a train or a bus?
reference answer: This is a bus.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'train\')\nBOX1=LOC(image=IMAGE,object=\'bus\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'train\' if {ANSWER0} > 0 else \'bus\' if {ANSWER1} > 0 else \'neither\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'train'}}
train
Are these animals of the same species?
reference answer: No, there are both horses and deer.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Are these animals of the same species?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'no'}}
no
Is the color of the sidewalk different than the street sign?
reference answer: Yes, the street sign is red and the sidewalk is black.
LOC
CROP
LOC
CROP
VQA
VQA
EVAL
RESULT
s/it] 4%|â | 78/1962 [03:28<1:23:55, 2.67s/it] 4%|â | 79/1962 [03:30<1:17:53, 2.48s/it] 4%|â | 80/1962 [03:33<1:24:37, 2.70s/it] 4%|â | 81/1962 [03:37<1:31:33, 2.92s/it] 4%|â | 82/1962 [03:40<1:32:07, 2.94s/it] 4%|â | 83/1962 [03:42<1:24:30, 2.70s/it] 4%|â | 84/1962 [03:45<1:25:33, 2.73s/it] 4%|â | 85/1962 [03:47<1:26:22, 2.76s/it] 4%|â | 86/1962 [03:50<1:25:39, 2.74s/it] 4%|â | 87/1962 [03:52<1:20:16, 2.57s/it] 4%|â | 88/1962 [03:54<1:14:25, 2.38s/it] 5%|â | 89/1962 [03:57<1:15:08, 2.41s/it] 5%|â | 90/1962 [04:00<1:24:24, 2.71s/it] 5%|â | 91/1962 [04:03<1:22:48, 2.66s/it] 5%|â | 92/1962 [04:05<1:23:27, 2.68s/it] 5%|â | 93/1962 [04:08<1:22:12, 2.64s/it] 5%|â | 94/1962 [04:10<1:12:58, 2.34s/it] 5%|â | 95/1962 [04:13<1:25:04, 2.73s/it] 5%|â | 96/1962 [04:16<1:24:03, 2.{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'sidewalk\')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE,object=\'street sign\')\nIMAGE1=CROP(image=IMAGE,box=BOX1)\nANSWER0=VQA(image=IMAGE0,question=\'What color is the sidewalk?\')\nANSWER1=VQA(image=IMAGE1,question=\'What color is the street sign?\')\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} != {ANSWER1} else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Is there an ottoman or a bed in the image?
reference answer: Yes, there is an ottoman.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'ottoman\')\nBOX1=LOC(image=IMAGE,object=\'bed\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} + {ANSWER1} > 0 else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'yes'}}
yes
Are there tomatoes or potatoes that are not round?
reference answer: No, there is a potato but it is round.
LOC
LOC
COUNT
COUNT
EVAL
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'tomato\')\nBOX1=LOC(image=IMAGE,object=\'potato\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'yes\' if {ANSWER0} + {ANSWER1} > 0 and else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'Runtime error: invalid syntax (<string>, line 1)'}}
Runtime error: invalid syntax (<string>, line 1)
Which material was used to make that watch, stainless steel or leather?
reference answer: The watch is made of stainless steel.
VQA
EVAL
RESULT
{'agent': {'program': 'ANSWER0=VQA(image=IMAGE,question=\'Which material was used to make that watch?\')\nANSWER1=EVAL(expr="\'stainless steel\' if {ANSWER0} == \'watch\' else \'leather\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'leather'}}
leather
Is this a cabinet or a coffee table?
reference answer: This is a cabinet.
LOC
LOC
COUNT
COUNT
EVAL
RESULT
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'cabinet\')\nBOX1=LOC(image=IMAGE,object=\'coffee table\')\nANSWER0=COUNT(box=BOX0)\nANSWER1=COUNT(box=BOX1)\nANSWER2=EVAL(expr="\'cabinet\' if {ANSWER0} > 0 else \'coffee table\'")\nFINAL_RESULT=RESULT(var=ANSWER2)', 'answer': 'cabinet'}}
cabinet
Are there cones behind the hydrant made of metal?
reference answer: Yes, there is a cone behind the hydrant.
LOC
CROP_BEHIND
LOC
COUNT
EVAL
{'agent': {'program': 'BOX0=LOC(image=IMAGE,object=\'hydrant\')\nIMAGE0=CROP_BEHIND(image=IMAGE,box=BOX0)\nBOX1=LOC(image=IMAGE0,object=\'cones\')\nANSWER0=COUNT(box=BOX1)\nANSWER1=EVAL(expr="\'yes\' if {ANSWER0} > 0 and else \'no\'")\nFINAL_RESULT=RESULT(var=ANSWER1)', 'answer': 'Runtime error: invalid syntax (<string>, line 1)'}}
Runtime error: invalid syntax (<string>, line 1)
What device is the same color as the umbrella?
reference answer: The ipod is the same color as the umbrella.
LOC
CROP
VQA
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='umbrella')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='What color is the umbrella?')\nANSWER1=VQA(image=IMAGE,question='What device is the same color as the umbrella?')\nFINAL_RESULT=RESULT(var=ANSWER1)", 'answer': 'cell phone'}}
cell phone
Do you think the squash is raw?
reference answer: Yes, the squash is raw.
LOC
CROP
VQA
RESULT
{'agent': {'program': "BOX0=LOC(image=IMAGE,object='squash')\nIMAGE0=CROP(image=IMAGE,box=BOX0)\nANSWER0=VQA(image=IMAGE0,question='Is the squash raw?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'no'}}
no
Is it an outdoors scene?
reference answer: Yes, it is outdoors.
VQA
RESULT
{'agent': {'program': "ANSWER0=VQA(image=IMAGE,question='Is it an outdoors scene?')\nFINAL_RESULT=RESULT(var=ANSWER0)", 'answer': 'yes'}}
yes
Is there a tie that is not black?
reference answer: No, there is a tie but it is black.
LOC