In this post we will see how to create a data pipeline in AWS which picks data from S3 csv file and inserts records in RDS MySQL table.
I am using below csv file which contains a list of passengers.
CSV Data stored in the file Passenger.csv
Upload Passenger.csv file to S3 bucket using AWS ClI
In below screenshot I am connecting the RDS MySQL instance I have created in AWS and the definition of the table that I have created in the database testdb.
Once we have uploaded the csv file we will create the data pipeline. There are 2 ways to create the pipeline.
- Using "Import Definition" option under AWS console.
- Using "Edit Architect" option under AWS console.
- Create data pipeline using architect.
- Add a Copy activity
- Define S3 Data node as input and MySQL Data node as output in the Copy Activity
- Under S3 Data node add additional field "File Path" and provide the full S3 path of the csv file
- Under S3 Data node add additional field "Data Format"
- Under newly added Data format node specify CSV as the "Type".
- Under MySQL data node add an optional field "Database" -> create new database
- In the new database node, provide the details of your RDS MySQL instance. Do remember to specify the region where your RDS instance is located.
- Under Configuration node set the "Failure and Rerun Mode" to "Cascade"
- Under Copy Activity data node add additional field "Runs On" and create new resource
- Under New Resource box provide "EC2Resource", as we will spin up a new EC2 instance to run the copy activity. We would also provide the type of EC2 instance that will be used by this copy activity. In my example, I am giving the value of "t2.micro", which is eligible for free tier.
- You can also provide a "worker group" instead of using "runs on". You will have to install aws task runner on an existing EC2 instance of your choice to use it as a worker group. When using this option, the pipeline will not have to wait for the time it takes to spin up the EC2 instance, which is the case in using "runs on"
- Save and Activate the Pipeline
Create data pipeline using architect.
S3 Datanode and MySQL DataNode
Run as EC2 Node
CSV Data Node and Configuration Node
RDS Database Node used by MySQL DataNode
Save and Activate the Pipeline
This is fairly normal PLA materials but it’s a little narrower than different brands’. This does mean that filament is slightly extra cost-intensive, but rare users will discover that they don’t have to purchase additional filament all that always anyway. This printer helps each easy and superior supplies, though there are Direct CNC limits. You won’t ready to|be capable of|have the power to} print with wood, for example, but much less widespread plastic composites will work simply fine. This printer helps tetherless printing via SD card, but there’s no WiFI functionality. Considering it’s the type of product that may be finest suited to a computer lab, this really isn’t as massive of a problem as it appears at first.
ReplyDelete