Load records from csv file in S3 file to RDS MySQL database using AWS Data Pipeline

In this post we will see how to create a data pipeline in AWS which picks data from S3 csv file and inserts records in RDS MySQL table.

I am using below csv file which contains a list of passengers.

CSV Data stored in the file Passenger.csv

Upload Passenger.csv file to S3 bucket using AWS ClI

In below screenshot I am connecting the RDS MySQL instance I have created in AWS and the definition of the table that I have created in the database testdb.

Once we have uploaded the csv file we will create the data pipeline. There are 2 ways to create the pipeline.

Using "Import Definition" option under AWS console.

We can use import definition option while creating the new pipeline. This would need a json file which contains the definition of the pipeline in the json format. You can use my Github link below to download the JSON definition:

JSON Definition to create the Data Pipeline

Using "Edit Architect" option under AWS console.

Create data pipeline using architect.
Add a Copy activity
Define S3 Data node as input and MySQL Data node as output in the Copy Activity
Under S3 Data node add additional field "File Path" and provide the full S3 path of the csv file
Under S3 Data node add additional field "Data Format"
Under newly added Data format node specify CSV as the "Type".
Under MySQL data node add an optional field "Database" -> create new database
In the new database node, provide the details of your RDS MySQL instance. Do remember to specify the region where your RDS instance is located.
Under Configuration node set the "Failure and Rerun Mode" to "Cascade"
Under Copy Activity data node add additional field "Runs On" and create new resource
Under New Resource box provide "EC2Resource", as we will spin up a new EC2 instance to run the copy activity. We would also provide the type of EC2 instance that will be used by this copy activity. In my example, I am giving the value of "t2.micro", which is eligible for free tier.
You can also provide a "worker group" instead of using "runs on". You will have to install aws task runner on an existing EC2 instance of your choice to use it as a worker group. When using this option, the pipeline will not have to wait for the time it takes to spin up the EC2 instance, which is the case in using "runs on"
Save and Activate the Pipeline

Create data pipeline using architect.

Copy activity

S3 Datanode and MySQL DataNode

Run as EC2 Node

CSV Data Node and Configuration Node

RDS Database Node used by MySQL DataNode

Save and Activate the Pipeline

Comments

Anonymous3 December 2022 at 03:42
This is fairly normal PLA materials but it’s a little narrower than different brands’. This does mean that filament is slightly extra cost-intensive, but rare users will discover that they don’t have to purchase additional filament all that always anyway. This printer helps each easy and superior supplies, though there are Direct CNC limits. You won’t ready to|be capable of|have the power to} print with wood, for example, but much less widespread plastic composites will work simply fine. This printer helps tetherless printing via SD card, but there’s no WiFI functionality. Considering it’s the type of product that may be finest suited to a computer lab, this really isn’t as massive of a problem as it appears at first.
ReplyDelete
Replies

Add comment

Akshay Narang

Search This Blog

Load records from csv file in S3 file to RDS MySQL database using AWS Data Pipeline

Comments

Post a Comment

Popular posts from this blog

Configure Oracle ASM Disks on AIX

Adding New Disks to Existing ASM Disk Group

Gitlab installation steps on Redhat Linux