This is my journey in building a data pipeline on Amazon Elastic Map Reduce cluster to analyze historic sensor data.
Overview: The objective is to analyze historic sensor data saved on AWS s3 in hourly format. To save data in hourly format made sense from the business use case stand point but as we know Hadoop Map Reduce performance gets effected with many small files.
Just to give an idea for a single sensor for an year: 24*365 = 8760 files. We can linearly extrapolate the number of files with number of sensors.
Example : 10,000 Sensors Approx for 5…
Pub-Sub Pattern or Observer Pattern
Event monitoring is a key aspect for many GUI and IoT applications. Publish Subscribe is a commonly used mechanism in event driven architecture. At a system level there are many event brokers that are used in the transit between producer and consumer using pub-sub pattern.
At software level observer pattern is used to achieve the pub-sub mechanism. Observer pattern falls under the realm of Behavioral pattern as it controls the operation of an object, example to update, notify etc. …
Strategy Pattern while implementing ML Algorithms for Data science problems in Python.
Strategy Pattern is a commonly used design pattern for implementing Algorithms. It is used to make behavior of the object dynamic.
There are multiple algorithms that can be used for clustering problem like Kmeans Clustering, DBSCAN, Hierarchial Clustering etc. Strategy Pattern allows to group these algorithms, encapsulate and use them interchangeably.
As the family of algorithms are used for a specific problem, they have similar inputs and outputs.
Before going in to Strategy Pattern Lets recall SOLID principles for object oriented design