Skip to main content

AWS Lambda function to Upload data from CSV file in S3 to DynamoDB table

Use Case: Let's assume that there is some process which uploads a csv file in a S3 bucket. This file contains the information about different products and their prices. 
We need to create a Lambda function, which picks the file from S3 bucket as soon as it is uploaded and adds the records/items in this file to a dynamodb table.

We will use AWS Lambda service to get this working.

High Level Steps

1. Create a S3 bucket.
2. Create IAM policy and role.
3. Create DynamoDB table.
4. Create Lambda function.
5. Create Lambda Trigger.
6. Enable Cloudwatch.
7. Monitor the Lambda function execution in Cloudwatch

Detailed Steps:
  • Create a S3 bucket:
           Create a S3 bucket with any name. Follow my previous post for the steps to create S3 bucket. Bucket name that I have used in this example is "source-bucket104". You can use any bucket name that is not already in use.
  • Create IAM Policy: Create an IAM policy which provides read access to the bucket that you created in the previous step. Follow my previous post for the steps to create IAM policy and role. You can also use JSON editor to create the policy with below policy description:
  • Create IAM Role: Now goto IAM -> roles and create a new role. Attach the below policies to the role:
    • The policy that was created in previous step which can read from S3 bucket.
    • Existing AWS managed policy AWSLambdaBasicExecutionRole: This policy gives the lambda function privileges to write logs to AWS CloudWatch.

  • Create a table in DynamoDB: There are several ways to create a table in DynamoDB, one of them being AWS CLI. You can use below command inside AWS CLI to create a new table in DynamoDB. 
aws dynamodb create-table --table-name prices_table --attribute-definitions AttributeName=productid,AttributeType=S --key-schema AttributeName=productid,KeyType=HASH --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

        You can also use AWS management console -> Services -> Database -> DynamoDB -> Create Table option. 

 

  • Create Lambda Function: 
    • Goto Services -> Compute -> Lambda -> Create Function
    • Choose a name of the function
    • Provide runtime as "Python 3.8" or similar
    • Under "choose the execution role", select the existing role that you created in previous steps.
    • Click "Create Function"
    • Enter the Lambda code provided in the link below to the "Function Code" window:
                            Lambda Function to Add Items to Dynamo DB Table


            This function does the following operations:
      • Reads a csv file from S3 bucket.
      • Gives an error if it not a .csv file.
      • Reads the file line by line
      • Skips the first line of the file, assuming that it contains column header
      • Splits each line using comma (,) as delimiter and stores the column values in different variables.
      • Adds an item in the dynamodb table "prices_table" and returns response.
  • Create Lambda Trigger. In the Lambda console, click on Add trigger and create a trigger against the S3 bucket you created in previous steps for all object create events.
  • Test and Monitor: 
    • Now upload a csv file in S3 bucket which contains the relevant data. I have used a csv file which consists of below columns. I have used the same columns inside my function to write to dynamodb
                   productid,product_name,price,sale_price,code
    • Goto DynamoDB > Tables
    • Click on the table that you created and goto Items tab. You should see all the records in the csv file added to the table.
    • To monitor the logs of lambda funtion, goto CloudWatch Service -> Logs -> Log groups -> <Lambda Function Name>
    • Logs should show something similar to below if the file records were added successfully in the dynamodb table:

Comments

Popular posts from this blog

Configure Oracle ASM Disks on AIX

Configure Oracle ASM Disks on AIX You can use below steps to configure the new disks for ASM after the raw disks are added to your AIX server by your System/Infrastructure team experts: # /usr/sbin/lsdev -Cc disk The output from this command is similar to the following: hdisk9 Available 02-T1-01 PURE MPIO Drive (Fibre) hdisk10 Available 02-T1-01 PURE MPIO Drive (Fibre) If the new disks are not listed as available, then use the below command to configure the new disks. # /usr/sbin/cfgmgr Enter the following command to identify the device names for the physical disks that you want to use: # /usr/sbin/lspv | grep -i none This command displays information similar to the following for each disk that is not configured in a volume group: hdisk9     0000014652369872   None In the above example hdisk9 is the device name and  0000014652369872  is the physical volume ID (PVID). The disks that you want to use may have a PVID, but they must not belong to a volu...

Adding New Disks to Existing ASM Disk Group

Add Disks to Existing ASM Disk Group In this blog I will show how to add new disks to an existing ASM Disk group. This also contains the steps to perform the migration from existing to the new storage system. In order to add the disk to the ASM disk group, you will first need to configure these disk using the operating system commands. I have provided the steps to configure the disks on AIX system in my blog " Configure Oracle ASM Disks on AIX" Adding New Disks to DATA Disk Group (Storage Migration for DATA Disk Group) Login to your ASM instance $ sqlplus / as sysasm If the name of the new disk is in different format from the existing disk, the modify the asm_diskstring parameter to identify the new disks. In my below example /dev/ora_data* is the format of the existing disks and /dev/new_disk* is the naming format of the newly configured disks. You should not modify this parameter unless the naming format changes. SQL> alter system set asm_diskstring = '/dev/ora_data*...

Load records from csv file in S3 file to RDS MySQL database using AWS Data Pipeline

 In this post we will see how to create a data pipeline in AWS which picks data from S3 csv file and inserts records in RDS MySQL table.  I am using below csv file which contains a list of passengers. CSV Data stored in the file Passenger.csv Upload Passenger.csv file to S3 bucket using AWS ClI In below screenshot I am connecting the RDS MySQL instance I have created in AWS and the definition of the table that I have created in the database testdb. Once we have uploaded the csv file we will create the data pipeline. There are 2 ways to create the pipeline.  Using "Import Definition" option under AWS console.                    We can use import definition option while creating the new pipeline. This would need a json file which contains the definition of the pipeline in the json format. You can use my Github link below to download the JSON definition: JSON Definition to create the Data Pipeline Using "Edit Architect" ...