10 min to read
Prism Project Architecture
MVP Architecture for IoT Voice Command Application
Table of Contents
- Overview
- Key Requirements & Considerations
- High-Level Architecture Diagram
- AWS Components
- Detailed Flow
- Security and Trade-offs
- Future Enhancements
- Frontend Requirements
- Backend Requirements
Overview
This document describes the MVP (Minimum Viable Product) architecture for an IoT voice command application. The goal is to rapidly develop a low-cost, serverless solution that allows users to:
- Register and connect their Arduino devices via Wi-Fi (using a UUID).
- Send voice commands (up to 10 seconds) through a web frontend.
- Have the backend process these commands using a third-party AI service (“Gemini API”).
- Return the processed result to both the web frontend and the Arduino in (near) real-time via WebSocket.
Because this is an MVP, minimal security and authentication are implemented. We use Arduino UUID + PIN to authenticate requests. In a production environment, this will be replaced or augmented by more robust solutions (like AWS Cognito or JWT tokens).
All AWS services are placed in a public subnet for simplicity and cost reasons, with the understanding that this might not be optimal for production but is sufficient for our low-traffic MVP. To mitigate certain security risks, we strictly manage IAM policies and ensure S3 is not publicly accessible.
Key Requirements & Considerations
-
Fast Development
- Prioritize a minimal and straightforward architecture to speed up development.
-
Low Cost
- Leverage serverless AWS services (Lambda, DynamoDB, S3, API Gateway) within Free Tier thresholds where possible.
-
Traffic Pattern
- The application does not expect constant traffic. Hence, serverless pay-per-invocation is cost-effective.
-
Security
- MVP-level security with a simple
(Arduino UUID, PIN)check. - Acknowledged risk of replay attacks, plaintext credential, client-side storage exploit, and lack of access controls for device commands.
- Planned upgrade to more robust security (e.g., Cognito) at production scale.
- MVP-level security with a simple
-
Data Sensitivity
- Some potential PII in the voice recordings/results.
- Use S3 Server-Side Encryption (SSE) and strict IAM policies to protect any saved JSON results.
High-Level Architecture Diagram
- Frontend calls REST API with voice data, Arduino UUID, and PIN.
- LLM Orchestration Lambda checks DynamoDB to validate
(UUID, PIN). - Gemini API processes the audio and returns a JSON result.
- The result JSON is stored in S3 (SSE encryption).
- LLM Orchestration Lambda fetches
connectionIdfor thatUUIDfrom DynamoDB, uses API Gateway WebSocket to send success message to Arduino and also a separate message to the Frontend.
Changed Project Architecture
AWS Components
1. DynamoDB
-
auth table:
- Schema:
arduino_uuid (String)as primary key, andpin (Number) - Stores the credential-like data for a quick authentication check.
- Note: PIN is stored in plaintext (MVP). Vulnerable to replay attacks.
- Schema:
-
websocket table:
- Schema:
arduino_uuid (String)as primary key,connection_id (String) - Maps the Arduino’s UUID to its current WebSocket connection ID.
- Whenever the Arduino reconnects, overwrite the existing connection ID.
- Schema:
-
device info table (Referenced only if needed by the specific use case, e.g., for RGB color mode):
- Schema: depends on the device type (e.g.,
device_typeas a key, and stored preset color codes, ir_code, etc.) - Used to look up preset color codes for specific device types.
- Schema: depends on the device type (e.g.,
2. API Gateway
-
REST API:
- Handles the HTTP(S) calls from the web frontend (voice data, UUID, PIN).
- Integrates with the LLM Orchestration Lambda (proxy integration).
-
WebSocket API:
- Maintains real-time connections to Arduino devices.
- Allows backend to push messages/commands via
connectionId.
3. AWS Lambda (LLM Orchestration Lambda)
- Receives voice commands from API Gateway (REST).
- Validates
(UUID, PIN)via DynamoDB (auth table). - Stores the voice data in memory or temporary
/tmpfolder. - Invokes Gemini API with the audio data:
- Timeout: Retries up to 3 times.
- On failure, sends a “failure message” back to the frontend and optionally to the Arduino.
- Receives the JSON response from Gemini, saves it to S3 (
user/{arduino_uuid}folder). - If RGB mode, retrieve preset color code that is closest to our RGB code and calculate how much should R, G, B be incremented and decremented.
- Uses
uuid -> connection_idfromwebsocket tabletopostToConnectionon the WebSocket API (message to Arduino). - Additionally sends a message to the frontend to inform status/progress.
4. Amazon S3
- Bucket Structure:
user/{arduino_id}/... - Server-Side Encryption: SSE-S3 to ensure data at rest is encrypted.
- Access: Bucket is not publicly accessible; strict IAM policy allows only the Lambda role to write/read.
5. Arduino Device
- Connects to Wi-Fi using a known SSID/password (manually configured for MVP).
- On startup:
- Opens a WebSocket connection to the
API Gateway (WS URL). - Sends its
arduino_uuidto register the connection ID in DynamoDB.
- Opens a WebSocket connection to the
- Receives real-time messages from the backend.
- When it receives the final JSON or command, it checks if the
uuidis correct and performs the IR transmitter logic to control an LED strip (or other hardware-specific action).
Detailed Main feature Flow
1. Arduino WebSocket Connection
- Arduino boots and joins the school Wi-Fi (manually configured).
- Arduino opens a WebSocket connection to
wss://{api_gateway_websocket_url}. - Upon connection, it immediately sends its
arduino_uuid. - API Gateway (WS) triggers a small Lambda (or integration) that stores the
(arduino_uuid, connection_id)pair into the websocket table in DynamoDB.- If the
arduino_uuidalready exists, theconnection_idis overwritten.
- If the
2. Frontend REST Request for Voice Commands
-
Prerequisite:
- The user has already entered their Arduino UUID + PIN + Device Type on the frontend.
- The frontend stores these credentials locally (e.g., in localStorage).
- Security Trade-off: Less secure but acceptable for an MVP.
-
User Action:
- The user records (or selects) an audio file (max 10 seconds).
-
HTTP Request:
- The frontend sends a
POSTrequest to the API Gateway (REST) including:- Audio file (binary or base64-encoded)
arduino_uuidpinDevice Type
- The frontend sends a
3. Backend Processing
1. Invocation
- LLM Orchestration Lambda is invoked with:
- The audio file
uuidpinDevice Type
2. Authentication
- LLM Orchestration Lambda queries the
auth tableto verify that(uuid, pin)is valid.- If invalid:
- Returns a
401error (or another error) to the frontend.
- Returns a
- If invalid:
3. Audio Storage
- If authentication is successful:
- Temporarily store the audio either in memory or in the
/tmpdirectory.
- Temporarily store the audio either in memory or in the
4. Audio Processing via Gemini API
- Lambda calls the external Gemini API to process the audio:
- Request Timeout: 10 seconds.
- Retry Mechanism: Up to 3 retries if the call fails or times out.
- Failure Handling:
- If all retries fail, the Lambda sends a “failure message” back to the frontend (and Arduino if needed).
5. Success Handling from Gemini API
-
Upon receiving a successful JSON response:
- Save the JSON to S3 at
user/{arduino_uuid}/..., with server-side encryption. - Check if the command indicates
RGB/dynamicModewithRGBselected:- If yes, then:
- Query the device table in DynamoDB with device_type (primary key) and id (partition key) by the device type to get the IR code.
- Get all required IR code (Power, red code up, etc.)
- Prepare these adjustment details for the next step (e.g., pass them along in the final command to Arduino).
- If yes, then:
- Retrieve the
connection_idfrom the websocket table by matching thearduino_uuid. - Send a WebSocket message (via
postToConnection) to the Arduino, indicating success and including either the final JSON or the relevant command details. - Send a separate message or HTTP response to the Frontend to update the status (e.g., “Processing complete, AI found the correct command.”).
- Save the JSON to S3 at
6. Frontend and UI Updates
-
Frontend:
- Receives the success/failure message from the Lambda (via REST response or an additional WebSocket channel).
- Displays status messages (e.g., “Processing complete…”).
-
UI:
- May display a “Completed!” notification that auto-hides after a few seconds.
Arduino Device Action
-
Once the Arduino receives the real-time WebSocket message, it:
- Verifies that the
uuidin the message is correct. - Extracts the relevant command from the JSON (or from the color adjustment data if in RGB mode).
- Verifies that the
-
The Arduino triggers the IR transmitter (or equivalent hardware) to control the LED strip or perform the indicated action.
-
Completion: The process is complete.
MVP only supports the following remote controller due to IR code compatibility

Surprise Me!
- Get past data to automatically determine what’s the best light settings for user without any user’s input.
- Idea in Progress
Security and Trade-offs
-
Arduino UUID + PIN
- Simple, but vulnerable to replay attacks if intercepted.
- No advanced encryption or token-based mechanism in the MVP.
- In a real environment, this would be replaced with or complemented by Cognito or JWT for robust authentication/authorization.
-
S3 Access Controls
- Bucket is private, not publicly accessible.
- Server-Side Encryption (SSE) ensures data at rest is encrypted.
-
IAM Policies
- The Lambda role has minimum required permissions to read/write DynamoDB, put objects to S3, and use
postToConnectionon API Gateway.
- The Lambda role has minimum required permissions to read/write DynamoDB, put objects to S3, and use
-
Public Subnet
- All services are in a public subnet for simplicity.
- Long term, a private subnet with NAT Gateway or VPC Endpoints would be more secure.
Future Enhancements
-
AWS Cognito Integration
- Replace
(uuid, pin)with Cognito-based JWT tokens for robust identity and access management. - Secure the Arduino–backend communication with short-lived tokens.
- Replace
-
Encrypted PIN or Other Credentials
- Store
pinin a hashed format in DynamoDB to limit damage if compromised.
- Store
-
Device Registration Portal
- Allow users to register new Arduino devices in a self-service manner, possibly with a more advanced provisioning flow.
-
Lifecycle Policies in S3
- Automatically purge older audio/JSON data after X days to reduce storage costs and limit potential exposure of PII.
-
Private Subnets & VPC
- Move Lambda and other services to private subnets with VPC endpoints for tighter security.