February 22, 2025 10 min to read

Prism Project Architecture

MVP Architecture for IoT Voice Command Application

Overview
Key Requirements & Considerations
High-Level Architecture Diagram
AWS Components
Detailed Flow
1. Arduino WebSocket Connection
2. Frontend REST Request for Voice Commands
Security and Trade-offs
Future Enhancements
Frontend Requirements
Backend Requirements

Overview

This document describes the MVP (Minimum Viable Product) architecture for an IoT voice command application. The goal is to rapidly develop a low-cost, serverless solution that allows users to:

Register and connect their Arduino devices via Wi-Fi (using a UUID).
Send voice commands (up to 10 seconds) through a web frontend.
Have the backend process these commands using a third-party AI service (“Gemini API”).
Return the processed result to both the web frontend and the Arduino in (near) real-time via WebSocket.

Because this is an MVP, minimal security and authentication are implemented. We use Arduino UUID + PIN to authenticate requests. In a production environment, this will be replaced or augmented by more robust solutions (like AWS Cognito or JWT tokens).

All AWS services are placed in a public subnet for simplicity and cost reasons, with the understanding that this might not be optimal for production but is sufficient for our low-traffic MVP. To mitigate certain security risks, we strictly manage IAM policies and ensure S3 is not publicly accessible.

Key Requirements & Considerations

Fast Development
- Prioritize a minimal and straightforward architecture to speed up development.
Low Cost
- Leverage serverless AWS services (Lambda, DynamoDB, S3, API Gateway) within Free Tier thresholds where possible.
Traffic Pattern
- The application does not expect constant traffic. Hence, serverless pay-per-invocation is cost-effective.
Security
- MVP-level security with a simple (Arduino UUID, PIN) check.
- Acknowledged risk of replay attacks, plaintext credential, client-side storage exploit, and lack of access controls for device commands.
- Planned upgrade to more robust security (e.g., Cognito) at production scale.
Data Sensitivity
- Some potential PII in the voice recordings/results.
- Use S3 Server-Side Encryption (SSE) and strict IAM policies to protect any saved JSON results.

High-Level Architecture Diagram

Frontend calls REST API with voice data, Arduino UUID, and PIN.
LLM Orchestration Lambda checks DynamoDB to validate (UUID, PIN).
Gemini API processes the audio and returns a JSON result.
The result JSON is stored in S3 (SSE encryption).
LLM Orchestration Lambda fetches connectionId for that UUID from DynamoDB, uses API Gateway WebSocket to send success message to Arduino and also a separate message to the Frontend. Changed Project Architecture

AWS Components

1. DynamoDB

auth table:
- Schema: arduino_uuid (String) as primary key, and pin (Number)
- Stores the credential-like data for a quick authentication check.
- Note: PIN is stored in plaintext (MVP). Vulnerable to replay attacks.
websocket table:
- Schema: arduino_uuid (String) as primary key, connection_id (String)
- Maps the Arduino’s UUID to its current WebSocket connection ID.
- Whenever the Arduino reconnects, overwrite the existing connection ID.
device info table (Referenced only if needed by the specific use case, e.g., for RGB color mode):
- Schema: depends on the device type (e.g., device_type as a key, and stored preset color codes, ir_code, etc.)
- Used to look up preset color codes for specific device types.

2. API Gateway

REST API:
- Handles the HTTP(S) calls from the web frontend (voice data, UUID, PIN).
- Integrates with the LLM Orchestration Lambda (proxy integration).
WebSocket API:
- Maintains real-time connections to Arduino devices.
- Allows backend to push messages/commands via connectionId.

3. AWS Lambda (LLM Orchestration Lambda)

Receives voice commands from API Gateway (REST).
Validates (UUID, PIN) via DynamoDB (auth table).
Stores the voice data in memory or temporary /tmp folder.
Invokes Gemini API with the audio data:
- Timeout: Retries up to 3 times.
- On failure, sends a “failure message” back to the frontend and optionally to the Arduino.
Receives the JSON response from Gemini, saves it to S3 (user/{arduino_uuid} folder).
If RGB mode, retrieve preset color code that is closest to our RGB code and calculate how much should R, G, B be incremented and decremented.
Uses uuid -> connection_id from websocket table to postToConnection on the WebSocket API (message to Arduino).
Additionally sends a message to the frontend to inform status/progress.

4. Amazon S3

Bucket Structure: user/{arduino_id}/...
Server-Side Encryption: SSE-S3 to ensure data at rest is encrypted.
Access: Bucket is not publicly accessible; strict IAM policy allows only the Lambda role to write/read.

5. Arduino Device

Connects to Wi-Fi using a known SSID/password (manually configured for MVP).
On startup:
1. Opens a WebSocket connection to the API Gateway (WS URL).
2. Sends its arduino_uuid to register the connection ID in DynamoDB.
Receives real-time messages from the backend.
When it receives the final JSON or command, it checks if the uuid is correct and performs the IR transmitter logic to control an LED strip (or other hardware-specific action).

Detailed Main feature Flow

1. Arduino WebSocket Connection

Arduino boots and joins the school Wi-Fi (manually configured).
Arduino opens a WebSocket connection to wss://{api_gateway_websocket_url}.
Upon connection, it immediately sends its arduino_uuid.
API Gateway (WS) triggers a small Lambda (or integration) that stores the (arduino_uuid, connection_id) pair into the websocket table in DynamoDB.
- If the arduino_uuid already exists, the connection_id is overwritten.

2. Frontend REST Request for Voice Commands

Prerequisite:
- The user has already entered their Arduino UUID + PIN + Device Type on the frontend.
- The frontend stores these credentials locally (e.g., in localStorage).
- Security Trade-off: Less secure but acceptable for an MVP.
User Action:
- The user records (or selects) an audio file (max 10 seconds).
HTTP Request:
- The frontend sends a POST request to the API Gateway (REST) including:
  - Audio file (binary or base64-encoded)
  - arduino_uuid
  - pin
  - Device Type

3. Backend Processing

1. Invocation

LLM Orchestration Lambda is invoked with:
- The audio file
- uuid
- pin
- Device Type

2. Authentication

LLM Orchestration Lambda queries the auth table to verify that (uuid, pin) is valid.
- If invalid:
  - Returns a 401 error (or another error) to the frontend.

3. Audio Storage

If authentication is successful:
- Temporarily store the audio either in memory or in the /tmp directory.

4. Audio Processing via Gemini API

Lambda calls the external Gemini API to process the audio:
- Request Timeout: 10 seconds.
- Retry Mechanism: Up to 3 retries if the call fails or times out.
- Failure Handling:
  - If all retries fail, the Lambda sends a “failure message” back to the frontend (and Arduino if needed).

5. Success Handling from Gemini API

Upon receiving a successful JSON response:
1. Save the JSON to S3 at user/{arduino_uuid}/..., with server-side encryption.
2. Check if the command indicates RGB/dynamicMode with RGB selected:
  - If yes, then:
    1. Query the device table in DynamoDB with device_type (primary key) and id (partition key) by the device type to get the IR code.
    2. Get all required IR code (Power, red code up, etc.)
    3. Prepare these adjustment details for the next step (e.g., pass them along in the final command to Arduino).
3. Retrieve the connection_id from the websocket table by matching the arduino_uuid.
4. Send a WebSocket message (via postToConnection) to the Arduino, indicating success and including either the final JSON or the relevant command details.
5. Send a separate message or HTTP response to the Frontend to update the status (e.g., “Processing complete, AI found the correct command.”).

6. Frontend and UI Updates

Frontend:
- Receives the success/failure message from the Lambda (via REST response or an additional WebSocket channel).
- Displays status messages (e.g., “Processing complete…”).
UI:
- May display a “Completed!” notification that auto-hides after a few seconds.

Arduino Device Action

Once the Arduino receives the real-time WebSocket message, it:
- Verifies that the uuid in the message is correct.
- Extracts the relevant command from the JSON (or from the color adjustment data if in RGB mode).
The Arduino triggers the IR transmitter (or equivalent hardware) to control the LED strip or perform the indicated action.
Completion: The process is complete.

MVP only supports the following remote controller due to IR code compatibility

Surprise Me!

Get past data to automatically determine what’s the best light settings for user without any user’s input.
Idea in Progress

Security and Trade-offs

Arduino UUID + PIN
- Simple, but vulnerable to replay attacks if intercepted.
- No advanced encryption or token-based mechanism in the MVP.
- In a real environment, this would be replaced with or complemented by Cognito or JWT for robust authentication/authorization.
S3 Access Controls
- Bucket is private, not publicly accessible.
- Server-Side Encryption (SSE) ensures data at rest is encrypted.
IAM Policies
- The Lambda role has minimum required permissions to read/write DynamoDB, put objects to S3, and use postToConnection on API Gateway.
Public Subnet
- All services are in a public subnet for simplicity.
- Long term, a private subnet with NAT Gateway or VPC Endpoints would be more secure.

Future Enhancements

AWS Cognito Integration
- Replace (uuid, pin) with Cognito-based JWT tokens for robust identity and access management.
- Secure the Arduino–backend communication with short-lived tokens.
Encrypted PIN or Other Credentials
- Store pin in a hashed format in DynamoDB to limit damage if compromised.
Device Registration Portal
- Allow users to register new Arduino devices in a self-service manner, possibly with a more advanced provisioning flow.
Lifecycle Policies in S3
- Automatically purge older audio/JSON data after X days to reduce storage costs and limit potential exposure of PII.
Private Subnets & VPC
- Move Lambda and other services to private subnets with VPC endpoints for tighter security.

Sunho's Dev Blog v3.1.2

Prism Project Architecture

Table of Contents

Overview

Key Requirements & Considerations

High-Level Architecture Diagram

AWS Components

1. DynamoDB

2. API Gateway

3. AWS Lambda (LLM Orchestration Lambda)

4. Amazon S3

5. Arduino Device

Detailed Main feature Flow

1. Arduino WebSocket Connection

2. Frontend REST Request for Voice Commands

3. Backend Processing

1. Invocation

2. Authentication

3. Audio Storage

4. Audio Processing via Gemini API

5. Success Handling from Gemini API

6. Frontend and UI Updates

Arduino Device Action

MVP only supports the following remote controller due to IR code compatibility

Surprise Me!

Security and Trade-offs

Future Enhancements

Sprint 2 Development Note

Sunho

Prism Project Architecture

Table of Contents

Overview

Key Requirements & Considerations

High-Level Architecture Diagram

AWS Components

1. DynamoDB

2. API Gateway

3. AWS Lambda (LLM Orchestration Lambda)

4. Amazon S3

5. Arduino Device

Detailed Main feature Flow

1. Arduino WebSocket Connection

2. Frontend REST Request for Voice Commands

3. Backend Processing

1. Invocation

2. Authentication

3. Audio Storage

4. Audio Processing via Gemini API

5. Success Handling from Gemini API

6. Frontend and UI Updates

Arduino Device Action

MVP only supports the following remote controller due to IR code compatibility

Surprise Me!

Security and Trade-offs

Future Enhancements

Sprint 2 Development Note

Share

Sunho