Big Lambda Serverless (BLS) is a framework for running MapReduce jobs on AWS Lambda powered by Serverless framework. BLS is based on some existing research about doing Big Data on AWS Lambda like this and this.
Execute following command
./install_serverless_and_requirements.sh
For proper usage of Big Lambda Serverless you must install latest version of Serverless framework (well, at least 1.8.0)
npm install -g serverless
to install Serverless globally. You also need npm to be installed.
Serverless helps developers to build apps on AWS Lambda. It is really a great framework supported by AWS.
Probably, next version will need additional 3rd parties libs, to install them execute
pip install -r requirements.txt -t vendored
or simply simply execute.
If you need additional 3rd parties libs for your implementation of Mapper or Reducer,
just write package you need to requirements.txt and execute command above to
save this package for AWS Lambda.
Nice tutorial about requirements file could be found here.
To make BLS work, you need to fill up your credentials in config.json and local.yml.
Create config.json and local.yml in the root folder then
carefully read examples/config.json and examples/local.yml for config's examples.
Generally, you need to fill
- Data and job bucket names and ARN's
- Lambda's names and params (etc. RAM and timeout)
The last one, you definitely want to create your own mapper and reducer for your tasks.
You can find all user's API is under api folder.
You need to write your own mapper and reducer classes.
Your implementation of these classes is inherited from Base class from api/src/base.py.
All internal logic is hidden under Base class. Your subclasses just needs to
- Redefine global
outputbuffer - Redefine
handlerfunction, to perform processing of the data
Probably sounds weired, so please check the examples and example folder.
Feel free to copy and paste from the examples.
To run a job on BLS you need to
- Deploy Lambda to your AWS account using command
sls deploy - Run
python run.pyfrom the root folder
Drink a cup of coffee and check your S3 job bucket for result
file (yeap, it is named result).