Skip to content
Scott Sievert edited this page Nov 1, 2016 · 15 revisions

How do I get many participants for some study?

We have used Mechanical Turk. We setup some machine (and obtain the URL), then direct them to this URL. Here they answer 50 questions, with no interactions with MTurk. At the end, we ask them to copy-paste their User ID (shown by default at the end of the study) back into MTurk. Using this, we can verify that they responded.

How do I access all the targets?

targets = butler.targets.get_targetset(butler.exp_uid)

How do I restart a machine I stopped on EC2?

  1. On EC2, restart the machine via Actions > Instance State > Start
  2. docker_login to your machine using the next_ec2.py script
  3. Run export NEXT_BACKEND_GLOBAL_HOST=ec2-...amazonaws.com
  4. Run docker-compose up.

How do I include feature vectors with targets?

There are three options:

  1. NEXT accepts a list of dictionaries as targets. These dictionaries get stored in the butler and are accessible. This requires launching the experiment yourself (and writing any necessary scripts).
  2. Enforcing that feature vectors be passed in to your app in initExp can be done in the YAML. This requires developing your own app.
  3. We have also developed a feature to allow adding feature vectors to images to the examples in example/. The example below will illustrate adding feature vectors to an existing application, the primary empirical use case we have seen.

The third option in detail:

Advantages of this approach include using your algorithm with an existing application/framework. You can easily compare your algorithm with other algorithms. A new algorithm has a choice of paying attention to feature vectors or not; it's up to the developer of that algorithm.

We need to modify the file that launches the experiment on NEXT (e.g., examples/strangefruit/experiment_triplet.py). In this, if we include a key target_features in the experiment dictionary, feature vectors will be added by example/launch_experiment.py (note: only for images ending in .png or .jpg).

The dictionary we add will have keys of different filenames and values of the feature vector. i.e., the dictionary is the form of {filename: feature_vector}. We

experiment['primary_type'] = 'image'
target_zip = 'strangefruit30.zip'
experiment['primary_target_file'] = target_zip
experiment['target_features'] = {filename.split('/')[-1]: np.random.rand(2).tolist()  # tolist() because numpy array not serializable
                                 for filename in zipfile.ZipFile(target_zip).namelist()}
# filename.split above removes 'strangefuit/' from 'strangefruit/image.png'. Required for
# use of lauch_experiment.py (which the examples in next/examples use)

Note: This is only provides information on putting features in targets. It not give information on how to load feature vectors (although I would use np.load or scipy.io.loadmat).

Then to access these in myAlg.py, in initExp we include these lines:

import numpy as np

class myAlg:
    def initExp(self, butler, ...):
        targets = butler.targets.get_targetset(butler.exp_uid)
        feature_matrix = [target['feature_vector'] for target in targets]
        feature_matrix = np.array(feature_matrix)
        # ...

Launching an experiment takes a long time. How do I debug with this?

Do not run docker-compose rm, as it removes your containers. If you run docker-compose stop; docker-compose start, your experiment will remain (docker-compose stop is typically run via Cntrl-C). For more detail, see the wiki page on debugging.

Clone this wiki locally