Scott's Recipes Logo

A Conceptual Architecture for a Filesystem to SQS Loader

IMG_3817.jpeg

I have an interesting technical situation facing me:

Each file needs to:

Note: SQS is the AWS “Simple Queue Service”. A queue is a specialized data structure which hands data off for processing by other tasks.

My initial thinking to handle this was to use Rust to write a high performance file processor. This appealed to me:

There are, naturally, problems here:

Thinking through all these issues as well as learning, by chance, that the runtime execution period for AWS lambda serverless functions had increased from 5 minutes to 15 minutes made me think in terms of a different architecture focused on using lambdas.

Note: A lambda is a self contained bit of code that you give over to AWS to manage on your behalf. Another term for Lambda is “functions as a service”. You don’t have to focus at all on servers, DevOps administration or the like.

Here is what I’m thinking:

  1. Add a network API to the filesystem of JSON files. This could literally be as simple as an NGINX server that listed the files.
  2. A lambda that requests a JSON file for processing per the description above and relies on a Redis dictionary to track files that have already been processed. Two dictionaries would be needed – json_files_processed and v (and, yes, there would need to be a way to expire things from json_files_processing in case a lambda crashes or is terminated; this would be a separate lambda).
  3. A CloudWatch Scheduler rule that triggers the lambda every 5 minutes

Fleshing this out further gives three lambda functions:

There would likely need to be 3 CloudWatch Scheduler rules one for each lambda.

The sqs_loader would need the ability to self terminate / exit if all files are currently being processed.