Skip to main content

Load records from csv file in S3 file to RDS MySQL database using AWS Data Pipeline

 In this post we will see how to create a data pipeline in AWS which picks data from S3 csv file and inserts records in RDS MySQL table. 

I am using below csv file which contains a list of passengers.

CSV Data stored in the file Passenger.csv


Upload Passenger.csv file to S3 bucket using AWS ClI


In below screenshot I am connecting the RDS MySQL instance I have created in AWS and the definition of the table that I have created in the database testdb.




Once we have uploaded the csv file we will create the data pipeline. There are 2 ways to create the pipeline. 

  • Using "Import Definition" option under AWS console.
                We can use import definition option while creating the new pipeline. This would need a json file which contains the definition of the pipeline in the json format. You can use my Github link below to download the JSON definition:


  • Using "Edit Architect" option under AWS console.
  1. Create data pipeline using architect.
  2. Add a Copy activity
  3. Define S3 Data node as input and MySQL Data node as output in the Copy Activity
  4. Under S3 Data node add additional field "File Path" and provide the full S3 path of the csv file
  5. Under S3 Data node add additional field "Data Format"
  6. Under newly added Data format node specify CSV as the "Type".
  7. Under MySQL data node add an optional field "Database" -> create new database
  8. In the new database node, provide the details of your RDS MySQL instance. Do remember to specify the region where your RDS instance is located.
  9. Under Configuration node set the "Failure and Rerun Mode" to "Cascade"
  10. Under Copy Activity data node add additional field "Runs On" and create new resource
  11. Under New Resource box provide "EC2Resource", as we will spin up a new EC2 instance to run the copy activity. We would also provide the type of EC2 instance that will be used by this copy activity. In my example, I am giving the value of "t2.micro", which is eligible for free tier.
  12. You can also provide a "worker group" instead of using "runs on". You will have to install aws task runner on an existing EC2 instance of your choice to use it as a worker group. When using this option, the pipeline will not have to wait for the time it takes to spin up the EC2 instance, which is the case in using "runs on"
  13. Save and Activate the Pipeline

Create data pipeline using architect.



Copy activity

S3 Datanode and MySQL DataNode


Run as EC2 Node

CSV Data Node and Configuration Node


RDS Database Node used by MySQL DataNode



Save and Activate the Pipeline




Comments

  1. This is fairly normal PLA materials but it’s a little narrower than different brands’. This does mean that filament is slightly extra cost-intensive, but rare users will discover that they don’t have to purchase additional filament all that always anyway. This printer helps each easy and superior supplies, though there are Direct CNC limits. You won’t ready to|be capable of|have the power to} print with wood, for example, but much less widespread plastic composites will work simply fine. This printer helps tetherless printing via SD card, but there’s no WiFI functionality. Considering it’s the type of product that may be finest suited to a computer lab, this really isn’t as massive of a problem as it appears at first.

    ReplyDelete

Post a Comment

Popular posts from this blog

Configure Oracle ASM Disks on AIX

Configure Oracle ASM Disks on AIX You can use below steps to configure the new disks for ASM after the raw disks are added to your AIX server by your System/Infrastructure team experts: # /usr/sbin/lsdev -Cc disk The output from this command is similar to the following: hdisk9 Available 02-T1-01 PURE MPIO Drive (Fibre) hdisk10 Available 02-T1-01 PURE MPIO Drive (Fibre) If the new disks are not listed as available, then use the below command to configure the new disks. # /usr/sbin/cfgmgr Enter the following command to identify the device names for the physical disks that you want to use: # /usr/sbin/lspv | grep -i none This command displays information similar to the following for each disk that is not configured in a volume group: hdisk9     0000014652369872   None In the above example hdisk9 is the device name and  0000014652369872  is the physical volume ID (PVID). The disks that you want to use may have a PVID, but they must not belong to a volu...

Adding New Disks to Existing ASM Disk Group

Add Disks to Existing ASM Disk Group In this blog I will show how to add new disks to an existing ASM Disk group. This also contains the steps to perform the migration from existing to the new storage system. In order to add the disk to the ASM disk group, you will first need to configure these disk using the operating system commands. I have provided the steps to configure the disks on AIX system in my blog " Configure Oracle ASM Disks on AIX" Adding New Disks to DATA Disk Group (Storage Migration for DATA Disk Group) Login to your ASM instance $ sqlplus / as sysasm If the name of the new disk is in different format from the existing disk, the modify the asm_diskstring parameter to identify the new disks. In my below example /dev/ora_data* is the format of the existing disks and /dev/new_disk* is the naming format of the newly configured disks. You should not modify this parameter unless the naming format changes. SQL> alter system set asm_diskstring = '/dev/ora_data*...