Technology
- Apache Nifi – comes with a webserver
- Apache Minifi – very lightweight solution
Usage
- Tail CDC database transaction logs and pipes it to the rest of the Apache NIFI cluster
- listens to port and takes data into stream
- transform data by pulling out attribute into meta key
- can write to specificAWS S3 folder object
- can run Ruby, Python n Java within each Node
Hardware Requirements
- minimum AWS T2.small instance type
- needs to have enough disk space to support the largest possible size of each batch of data
Insight
Data scientist does not want to waste time writing to and from Kafka