For lab this week, we focus on creating interactive systems that can detect and respond to events or stimuli in the environment of the Pi, like the Boat Detector we mentioned in lecture. Your observant device could, for example, count items, find objects, recognize an event or continuously monitor a room.
This lab will help you think through the design of observant systems, particularly corner cases that the algorithms needs to be aware of.
- Pull the new Github Repo.
- Install VNC on your laptop if you have not yet done so. This lab will actually require you to run script on your Pi through VNC so that you can see the video stream. Please refer to the prep for Lab 2, we offered the instruction at the bottom.
- Read about OpenCV, MediaPipe, and TeachableMachines.
- Read Belloti, et al.'s Making Sense of Sensing Systems: Five Questions for Designers and Researchers.
- Raspberry Pi
- Webcam
- Microphone (if you want to have speech or sound input for your design)
- Show pictures, videos of the "sense-making" algorithms you tried.
- Show a video of how you embed one of these algorithms into your observant system.
- Test, characterize your interactive device. Show faults in the detection and how the system handled it.
Building upon the paper-airplane metaphor (we're understanding the material of machine learning for design), here are the four sections of the lab activity:
A) Play
B) Fold
C) Flight test
D) Reflect
A more traditional method to extract information out of images is provided with OpenCV. The RPI image provided to you comes with an optimized installation that can be accessed through python. We included 4 standard OpenCV examples: contour(blob) detection, face detection with the Haarcascade
, flow detection (a type of keypoint tracking), and standard object detection with the Yolo darknet.
Most examples can be run with a screen (e.g. VNC or ssh -X or with an HDMI monitor), or with just the terminal. The examples are separated out into different folders. Each folder contains a HowToUse.md
file, which explains how to run the python example.
Following is a nicer way you can run and see the flow of the openCV-examples
we have included in your Pi. Instead of ls
, the command we will be using here is tree
. Tree is a recursive directory colored listing command that produces a depth indented listing of files. Install tree
first and cd
to the openCV-examples
folder and run the command:
pi@ixe00:~ $ sudo apt install tree
...
pi@ixe00:~ $ cd openCV-examples
pi@ixe00:~/openCV-examples $ tree -l
.
├── contours-detection
│ ├── contours.py
│ └── HowToUse.md
├── data
│ ├── slow_traffic_small.mp4
│ └── test.jpg
├── face-detection
│ ├── face-detection.py
│ ├── faces_detected.jpg
│ ├── haarcascade_eye_tree_eyeglasses.xml
│ ├── haarcascade_eye.xml
│ ├── haarcascade_frontalface_alt.xml
│ ├── haarcascade_frontalface_default.xml
│ └── HowToUse.md
├── flow-detection
│ ├── flow.png
│ ├── HowToUse.md
│ └── optical_flow.py
└── object-detection
├── detected_out.jpg
├── detect.py
├── frozen_inference_graph.pb
├── HowToUse.md
└── ssd_mobilenet_v2_coco_2018_03_29.pbtxt
The flow detection might seem random, but consider this recent research that uses optical flow to determine busy-ness in hospital settings to facilitate robot navigation. Note the velocity parameter on page 3 and the mentions of optical flow.
Now, connect your webcam to your Pi and use VNC to access to your Pi and open the terminal. Use the following command lines to try each of the examples we provided: (it will not work if you use ssh from your laptop)
pi@ixe00:~$ cd ~/openCV-examples/contours-detection
pi@ixe00:~/openCV-examples/contours-detection $ python contours.py
...
pi@ixe00:~$ cd ~/openCV-examples/face-detection
pi@ixe00:~/openCV-examples/face-detection $ python face-detection.py
...
pi@ixe00:~$ cd ~/openCV-examples/flow-detection
pi@ixe00:~/openCV-examples/flow-detection $ python optical_flow.py 0 window
...
pi@ixe00:~$ cd ~/openCV-examples/object-detection
pi@ixe00:~/openCV-examples/object-detection $ python detect.py
***Try each of the following four examples in the openCV-examples
, include screenshots of your use and write about one design for each example that might work based on the individual benefits to each algorithm.***
A more recent open source and efficient method of extracting information from video streams comes out of Google's MediaPipe, which offers state of the art face, face mesh, hand pose, and body pose detection.
To get started, create a new virtual environment with special indication this time:
pi@ixe00:~ $ virtualenv mpipe --system-site-packages
pi@ixe00:~ $ source mpipe/bin/activate
(mpipe) pi@ixe00:~ $
and install the following.
...
(mpipe) pi@ixe00:~ $ sudo apt install ffmpeg python3-opencv
(mpipe) pi@ixe00:~ $ sudo apt install libxcb-shm0 libcdio-paranoia-dev libsdl2-2.0-0 libxv1 libtheora0 libva-drm2 libva-x11-2 libvdpau1 libharfbuzz0b libbluray2 libatlas-base-dev libhdf5-103 libgtk-3-0 libdc1394-22 libopenexr23
(mpipe) pi@ixe00:~ $ pip3 install mediapipe-rpi4 pyalsaaudio
Each of the installs will take a while, please be patient. After successfully installing mediapipe, connect your webcam to your Pi and use VNC to access to your Pi, open the terminal, and go to Lab 5 folder and run the hand pose detection script we provide: (it will not work if you use ssh from your laptop)
(mpipe) pi@ixe00:~ $ cd Interactive-Lab-Hub/Lab\ 5
(mpipe) pi@ixe00:~ Interactive-Lab-Hub/Lab 5 $ python hand_pose.py
Try the two main features of this script: 1) pinching for percentage control, and 2) "Quiet Coyote" for instant percentage setting. Notice how this example uses hardcoded positions and relates those positions with a desired set of events, in hand_pose.py
lines 48-53.
***Consider how you might use this position based approach to create an interaction, and write how you might use it on either face, hand or body pose tracking.*** The first thing that comes to mind is using this to look at the hand to read sign language
(You might also consider how this notion of percentage control with hand tracking might be used in some of the physical UI you may have experimented with in the last lab, for instance in controlling a servo or rotary encoder.)
Google's TeachableMachines might look very simple. However, its simplicity is very useful for experimenting with the capabilities of this technology.
To get started, create and activate a new virtual environment for this exercise with special indication:
pi@ixe00:~ $ virtualenv tmachine --system-site-packages
pi@ixe00:~ $ source tmachine/bin/activate
(tmachine) pi@ixe00:~ $
After activating the virtual environment, install the requisite TensorFlow libraries by running the following lines:
(tmachine) pi@ixe00:~ $ cd Interactive-Lab-Hub/Lab\ 5
(tmachine) pi@ixe00:~ Interactive-Lab-Hub/Lab 5 $ sudo chmod +x ./teachable_machines.sh
(tmachine) pi@ixe00:~ Interactive-Lab-Hub/Lab 5 $ ./teachable_machines.sh
This might take a while to get fully installed. After installation, connect your webcam to your Pi and use VNC to access to your Pi, open the terminal, and go to Lab 5 folder and run the example script: (it will not work if you use ssh from your laptop)
(tmachine) pi@ixe00:~ Interactive-Lab-Hub/Lab 5 $ python tm_ppe_detection.py
(Optionally: You can train your own model, too. First, visit TeachableMachines, select Image Project and Standard model. Second, use the webcam on your computer to train a model. For each class try to have over 50 samples, and consider adding a background class where you have nothing in view so the model is trained to know that this is the background. Then create classes based on what you want the model to classify. Lastly, preview and iterate, or export your model as a 'Tensorflow' model, and select 'Keras'. You will find an '.h5' file and a 'labels.txt' file. These are included in this labs 'teachable_machines' folder, to make the PPE model you used earlier. You can make your own folder or replace these to make your own classifier.)
***Whether you make your own model or not, include screenshots of your use of Teachable Machines, and write how you might use this to create your own classifier. Include what different affordances this method brings, compared to the OpenCV or MediaPipe options.***
Don't forget to run deactivate
to end the Teachable Machines demo, and to reactivate with source tmachine/bin/activate
when you want to use it again.
Additional filtering and analysis can be done on the sensors that were provided in the kit. For example, running a Fast Fourier Transform over the IMU data stream could create a simple activity classifier between walking, running, and standing.
Using the accelerometer, try the following:
1. Set up threshold detection Can you identify when a signal goes above certain fixed values?
2. Set up averaging Can you average your signal in N-sample blocks? N-sample running average?
3. Set up peak detection Can you identify when your signal reaches a peak and then goes down?
***Include links to your code here, and put the code for these in your repo--they will come in handy later.*** My code is in project.py in this directory
Pick one of the models you have tried, pick a class of objects, and experiment with prototyping an interaction. This can be as simple as the boat detector earlier. Try out different interaction outputs and inputs.
***Describe and detail the interaction, as well as your experimentation here.*** My interaction comes from my sign language idea that I described above, but it is making the simplest version of that. The device tells the user when their hand is closed and open. This type of interaction can lead to advances in AR or devices in which the user's hand motions need to be read.
Now flight test your interactive prototype and note down your observations: For example:
- When does it what it is supposed to do? - This device records the user and lets them know the positioning of their hand
- When does it fail? - It fails when the user turns their hand around, and somethimes it detects other objects as hands
- When it fails, why does it fail? - When the hand is turning around, the nodes that are tracking the fingers are lost for a second and record the hand as being closed
- Based on the behavior you have seen, what other scenarios could cause problems? - Levels of light, detecting other things as hands
***Think about someone using the system. Describe how you think this will work.***
- Are they aware of the uncertainties in the system? - If they're unfamiliar with the technology, they wouldn't be aware of any errors that could be happening
- How bad would they be impacted by a miss classification? - Not that much, the errors only happen for a second. But, if they needed to use it in the dark, it would have a more serious impact
- How could change your interactive system to address this? - I could try and create other cases for what the hand might be doing
- Are there optimizations you can try to do on your sense-making algorithm. - It is very laggy, so there potentially are ways to reduce that lag and make it quicker. I've already made a few reductions that have made a difference, but it still needs work.
Now that you have experimented with one or more of these sense-making systems characterize their behavior. During the lecture, we mentioned questions to help characterize a material:
- What can you use X for?
- What is a good environment for X?
- What is a bad environment for X?
- When will X break?
- When it breaks how will X break?
- What are other properties/behaviors of X?
- How does X feel?
***Include a short video demonstrating the answers to these questions.*** https://drive.google.com/file/d/1_xk5cueI6udVU3nwlRynZKAfs6_M1Z59/view?usp=sharing
Following exploration and reflection from Part 1, finish building your interactive system, and demonstrate it in use with a video.
From part 1, the main thing I learned is that I needed to optimize the code. I did that, and added something else small, and put it into this video: https://drive.google.com/file/d/1OdBlPaoOfTMwpV0XSkfTmw6Br8G2vGlk/view?usp=sharing
***Include a short video demonstrating the finished result.***