Capstone-Projects-2024-Spring · NathanMcCourt · Jan 31, 2024 · Feb 27, 2024 · Mar 12, 2024 · Mar 12, 2024
diff --git a/.assets/bird_audio.wav b/.assets/bird_audio.wav
diff --git a/.assets/bird_image.jpg b/.assets/bird_image.jpg
diff --git a/.assets/car_audio.wav b/.assets/car_audio.wav
diff --git a/.assets/car_image.jpg b/.assets/car_image.jpg
diff --git a/.assets/dog_audio.wav b/.assets/dog_audio.wav
diff --git a/.assets/dog_image.jpg b/.assets/dog_image.jpg
diff --git a/.github/Untitled21-2-2.ipynb b/.github/Untitled21-2-2.ipynb
diff --git a/.github/Untitled21-2.ipynb b/.github/Untitled21-2.ipynb
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,9 @@
 .env.development.local
 .env.test.local
 .env.production.local
+
+# PTH files
+.checkpoints/imagebind_huge.pth
+
+# pycache
+__pycache__/
diff --git a/.idea/.gitignore b/.idea/.gitignore
diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml
diff --git a/.idea/misc.xml b/.idea/misc.xml
diff --git a/.idea/modules.xml b/.idea/modules.xml
diff --git a/.idea/project-waveease.iml b/.idea/project-waveease.iml
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
diff --git a/HelloWorld.py b/HelloWorld.py
@@ -0,0 +1,57 @@
+from imagebind import data
+import torch
+from imagebind.models import imagebind_model
+from imagebind.models.imagebind_model import ModalityType
+
+# A test file for imagebind
+
+text_list=["A dog.", "A car", "A bird"]
+image_paths=[".assets/dog_image.jpg", ".assets/car_image.jpg", ".assets/bird_image.jpg"]
+audio_paths=[".assets/dog_audio.wav", ".assets/car_audio.wav", ".assets/bird_audio.wav"]
+
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+
+# Instantiate model
+model = imagebind_model.imagebind_huge(pretrained=True)
+model.eval()
+model.to(device)
+
+# Load data
+inputs = {
+    ModalityType.TEXT: data.load_and_transform_text(text_list, device),
+    ModalityType.VISION: data.load_and_transform_vision_data(image_paths, device),
+    ModalityType.AUDIO: data.load_and_transform_audio_data(audio_paths, device),
+}
+
+with torch.no_grad():
+    embeddings = model(inputs)
+
+print(
+    "Vision x Text: ",
+    torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.TEXT].T, dim=-1),
+)
+print(
+    "Audio x Text: ",
+    torch.softmax(embeddings[ModalityType.AUDIO] @ embeddings[ModalityType.TEXT].T, dim=-1),
+)
+print(
+    "Vision x Audio: ",
+    torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.AUDIO].T, dim=-1),
+)
+
+# Expected output:
+#
+# Vision x Text:
+# tensor([[9.9761e-01, 2.3694e-03, 1.8612e-05],
+#         [3.3836e-05, 9.9994e-01, 2.4118e-05],
+#         [4.7997e-05, 1.3496e-02, 9.8646e-01]])
+#
+# Audio x Text:
+# tensor([[1., 0., 0.],
+#         [0., 1., 0.],
+#         [0., 0., 1.]])
+#
+# Vision x Audio:
+# tensor([[0.8070, 0.1088, 0.0842],
+#         [0.1036, 0.7884, 0.1079],
+#         [0.0018, 0.0022, 0.9960]])
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 <div align="center">
 
-# Project Name
+# WavEase - Python powered Gesture Recognition
 [![Report Issue on Jira](https://img.shields.io/badge/Report%20Issues-Jira-0052CC?style=flat&logo=jira-software)](https://temple-cis-projects-in-cs.atlassian.net/jira/software/c/projects/DT/issues)
 [![Deploy Docs](https://github.com/ApplebaumIan/tu-cis-4398-docs-template/actions/workflows/deploy.yml/badge.svg)](https://github.com/ApplebaumIan/tu-cis-4398-docs-template/actions/workflows/deploy.yml)
 [![Documentation Website Link](https://img.shields.io/badge/-Documentation%20Website-brightgreen)](https://applebaumian.github.io/tu-cis-4398-docs-template/)
@@ -11,52 +11,80 @@
 
 ## Keywords
 
-Section #, as well as any words that quickly give your peers insights into the application like programming language, development platform, type of application, etc.
+Section 004, as well as any words that quickly give your peers insights into the application like programming language, development platform, type of application, etc.
 
 ## Project Abstract
 
-This document proposes a novel application of a text message (SMS or Email) read-out and hands-free call interacted between an Android Smartphone and an infotainment platform (headunit) in a car environment. When a phone receives an SMS or Email, the text message is transferred from the phone to the headunit through a Bluetooth connection. On the headunit, user can control which and when the received SMS or E-mail to be read out through the in-vehicle audio system. The user may press one button on the headunit to activate the hands-free feature to call back the SMS sender.
+This project would create an application that allows users to use hand gestures in front of a sensor that have been mapped to enact specific commands. For example, a person could have a camera set up for gesture recognition and the network could be integrated with smart devices to turn that device on or off. Say you just sit down on the couch to watch a movie, but you can’t find the remote. With a gesture recognition system, you can simply signal a certain gesture at a camera set up with a connection to your TV and the device could turn on. This could also be the same for lighting in the house. This project would be done using Python.
 
 ## High Level Requirement
 
-Describe the requirements – i.e., what the product does and how it does it from a user point of view – at a high level.
+The product works by capturing and interpreting physical movements from the user’s hands or body parts. Those movements would then be translated into preset commands or actions. The high-level requirements would include sensor data acquisition, data processing, a gesture recognition algorithm, and command generation. From a user point of view, you would do a physical gesture in front of the sensor that has a command mapping. The program at a high level could be mapped to turn lights on or off. For this project we could start off by just printing text to a screen to communicate it is working.
 
 ## Conceptual Design
 
-Describe the initial design concept: Hardware/software architecture, programming language, operating system, etc.
+The conceptual design for this project would be a laptop with a built-in camera system to implement the gesture control system. The programming language would be python and the following python libraries would be used, OpenCV, TensorFlow, and NumPy. An open source data set for gesture recognition would be found and we would preprocess the images by resizing, normalizing, and converting them into a format suitable for model training. A model would be built using CNN architecture and TensorFlow. Once the model is trained, it could be deployed to use OpenCV to capture video frames from a camera, process them, and input them into our trained model. From this step, the project could go multiple different ways. For a more advanced project, we could link it to smart devices, but to start off we could print text to a screen of what the action would be performing. We will start with implementing this software to an application like Spotify and possibly consider many other applications like Youtube and Apple Music.
 
 ## Background
 
-The background will contain a more detailed description of the product and a comparison to existing similar projects/products. A literature search should be conducted and the results listed. Proper citation of sources is required. If there are similar open-source products, you should state whether existing source will be used and to what extent. If there are similar closed-source/proprietary products, you should state how the proposed product will be similar and different.
+The idea for this product is to associate specific gestures with predefined commands. For example, a swipe gesture to the right can be associated with turning on the lights, while a swipe gesture to the left can be associated with turning them off. While there is not an existing product that is able to do this, researchers at the University of Washington are close to achieving this. Their approach is to use Wi-Fi signals to detect specific movements instead of cameras (Ma,2013). This would be different from the approach I suggested earlier because my idea is to use a laptop camera. A product that I found that is like this is the Xbox Kinect. The Kinect uses cameras to recognize gestures and allow you to interact with games on the Xbox (Palangetić, 2014). This is like my proposal because it uses a camera to capture images and allow a user to interact with the games on the device. However, it is also different because it doesn’t connect to smart devices and allows you to control certain features.
 
 ## Required Resources
 
-Discuss what you need to develop this project. This includes background information you will need to acquire, hardware resources, and software resources. If these are not part of the standard Computer Science Department lab resources, these must be identified early and discussed with the instructor.
+The required hardware for this project would be a laptop with a working camera. Specific python libraries like TensorFlow, NumPy, and OpenCV would be needed to train the model and capture images to input into the model once it is trained. It would be beneficial if the people working on this project had experience or knowledge with computer vision, convolutional neural network architecture, and API calls to connect to the smart devices. While wireless networks would most likely be the preferred way, if anyone has experience with IoT devices and connections that could be a route to go for integrating smart devices as well.
 
 ## Collaborators
 
 [//]: # ( readme: collaborators -start )
 <table>
 <tr>
     <td align="center">
-        <a href="https://github.com/ApplebaumIan">
-            <img src="https://avatars.githubusercontent.com/u/9451941?v=4" width="100;" alt="ApplebaumIan"/>
+        <a href="https://github.com/kbarbarisi">
+            <img src="https://avatars.githubusercontent.com/u/73039627?v=4" width="100;" alt="Kianna"/>
             <br />
-            <sub><b>Ian Tyler Applebaum</b></sub>
+            <sub><b>Kianna Barbarisi</b></sub>
         </a>
     </td>
     <td align="center">
-        <a href="https://github.com/leekd99">
-            <img src="https://avatars.githubusercontent.com/u/32583417?v=4" width="100;" alt="leekd99"/>
+        <a href="https://github.com/tul53850">
+            <img src="https://avatars.githubusercontent.com/u/111989518?v=4" width="100;" alt="Jason"/>
             <br />
-            <sub><b>Kyle Dragon Lee</b></sub>
+            <sub><b>Jason Hankins</b></sub>
         </a>
     </td>
     <td align="center">
-        <a href="https://github.com/thanhnguyen46">
-            <img src="https://avatars.githubusercontent.com/u/60533187?v=4" width="100;" alt="thanhnguyen46"/>
+        <a href="https://github.com/SarinaCurtis">
+            <img src="https://avatars.githubusercontent.com/u/81874704?v=4" width="100;" alt="Sarina"/>
             <br />
-            <sub><b>Thanh Nguyen</b></sub>
+            <sub><b>Sarina Curtis</b></sub>
+        </a>
+    </td>
+    <td align="center">
+        <a href="https://github.com/tun71427">
+            <img src="https://avatars.githubusercontent.com/u/123014326?v=4" width="100;" alt="Yuxuan"/>
+            <br />
+            <sub><b>Yuxuan Zhu</b></sub>
+        </a>
+    </td>
+    <td align="center">
+        <a href="https://github.com/LeeMamori">
+            <img src="https://avatars.githubusercontent.com/u/123014841?v=4" width="100;" alt="Yang"/>
+            <br />
+            <sub><b>Yang Li</b></sub>
+        </a>
+    </td>
+    <td align="center">
+        <a href="https://github.com/tuk85473">
+            <img src="https://avatars.githubusercontent.com/u/97626755?v=4" width="100;" alt="Ashley"/>
+            <br />
+            <sub><b>Ashley Jones</b></sub>
+        </a>
+    </td>
+    <td align="center">
+        <a href="">
+            <img src="https://avatars.githubusercontent.com/u/97626755?v=4" width="100;" alt="Ashley"/>
+            <br />
+            <sub><b>Ashley Jones</b></sub>
         </a>
     </td>
    </tr>

diff --git a/__pycache__/Camera.cpython-311.pyc b/__pycache__/Camera.cpython-311.pyc
diff --git a/__pycache__/Camera.cpython-38.pyc b/__pycache__/Camera.cpython-38.pyc
diff --git a/__pycache__/utile.cpython-311.pyc b/__pycache__/utile.cpython-311.pyc
diff --git a/__pycache__/utile.cpython-38.pyc b/__pycache__/utile.cpython-38.pyc