Prism Project Architecture

MVP Architecture for IoT Voice Command Application

Featured image

Table of Contents

  1. Overview
  2. Key Requirements & Considerations
  3. High-Level Architecture Diagram
  4. AWS Components
  5. Detailed Flow
    1. Arduino WebSocket Connection
    2. Frontend REST Request for Voice Commands
  6. Security and Trade-offs
  7. Future Enhancements
  8. Frontend Requirements
  9. Backend Requirements

Overview

This document describes the MVP (Minimum Viable Product) architecture for an IoT voice command application. The goal is to rapidly develop a low-cost, serverless solution that allows users to:

Because this is an MVP, minimal security and authentication are implemented. We use Arduino UUID + PIN to authenticate requests. In a production environment, this will be replaced or augmented by more robust solutions (like AWS Cognito or JWT tokens).

All AWS services are placed in a public subnet for simplicity and cost reasons, with the understanding that this might not be optimal for production but is sufficient for our low-traffic MVP. To mitigate certain security risks, we strictly manage IAM policies and ensure S3 is not publicly accessible.


Key Requirements & Considerations

  1. Fast Development

    • Prioritize a minimal and straightforward architecture to speed up development.
  2. Low Cost

    • Leverage serverless AWS services (Lambda, DynamoDB, S3, API Gateway) within Free Tier thresholds where possible.
  3. Traffic Pattern

    • The application does not expect constant traffic. Hence, serverless pay-per-invocation is cost-effective.
  4. Security

    • MVP-level security with a simple (Arduino UUID, PIN) check.
    • Acknowledged risk of replay attacks, plaintext credential, client-side storage exploit, and lack of access controls for device commands.
    • Planned upgrade to more robust security (e.g., Cognito) at production scale.
  5. Data Sensitivity

    • Some potential PII in the voice recordings/results.
    • Use S3 Server-Side Encryption (SSE) and strict IAM policies to protect any saved JSON results.

High-Level Architecture Diagram

  1. Frontend calls REST API with voice data, Arduino UUID, and PIN.
  2. LLM Orchestration Lambda checks DynamoDB to validate (UUID, PIN).
  3. Gemini API processes the audio and returns a JSON result.
  4. The result JSON is stored in S3 (SSE encryption).
  5. LLM Orchestration Lambda fetches connectionId for that UUID from DynamoDB, uses API Gateway WebSocket to send success message to Arduino and also a separate message to the Frontend. MVP Architecture Changed Project Architecture

AWS Components

1. DynamoDB

2. API Gateway

3. AWS Lambda (LLM Orchestration Lambda)

4. Amazon S3

5. Arduino Device


Detailed Main feature Flow

1. Arduino WebSocket Connection

  1. Arduino boots and joins the school Wi-Fi (manually configured).
  2. Arduino opens a WebSocket connection to wss://{api_gateway_websocket_url}.
  3. Upon connection, it immediately sends its arduino_uuid.
  4. API Gateway (WS) triggers a small Lambda (or integration) that stores the (arduino_uuid, connection_id) pair into the websocket table in DynamoDB.
    • If the arduino_uuid already exists, the connection_id is overwritten.

2. Frontend REST Request for Voice Commands

  1. Prerequisite:

    • The user has already entered their Arduino UUID + PIN + Device Type on the frontend.
    • The frontend stores these credentials locally (e.g., in localStorage).
    • Security Trade-off: Less secure but acceptable for an MVP.
  2. User Action:

    • The user records (or selects) an audio file (max 10 seconds).
  3. HTTP Request:

    • The frontend sends a POST request to the API Gateway (REST) including:
      • Audio file (binary or base64-encoded)
      • arduino_uuid
      • pin
      • Device Type

3. Backend Processing

1. Invocation

2. Authentication

3. Audio Storage

4. Audio Processing via Gemini API

5. Success Handling from Gemini API

6. Frontend and UI Updates


Arduino Device Action

  1. Once the Arduino receives the real-time WebSocket message, it:

    • Verifies that the uuid in the message is correct.
    • Extracts the relevant command from the JSON (or from the color adjustment data if in RGB mode).
  2. The Arduino triggers the IR transmitter (or equivalent hardware) to control the LED strip or perform the indicated action.

  3. Completion: The process is complete.

MVP only supports the following remote controller due to IR code compatibility

shopping.jpeg


Surprise Me!


Security and Trade-offs


Future Enhancements

  1. AWS Cognito Integration

    • Replace (uuid, pin) with Cognito-based JWT tokens for robust identity and access management.
    • Secure the Arduino–backend communication with short-lived tokens.
  2. Encrypted PIN or Other Credentials

    • Store pin in a hashed format in DynamoDB to limit damage if compromised.
  3. Device Registration Portal

    • Allow users to register new Arduino devices in a self-service manner, possibly with a more advanced provisioning flow.
  4. Lifecycle Policies in S3

    • Automatically purge older audio/JSON data after X days to reduce storage costs and limit potential exposure of PII.
  5. Private Subnets & VPC

    • Move Lambda and other services to private subnets with VPC endpoints for tighter security.