Sunday, May 14, 2017

How to process huge files with hadoop

In order to read huge files we cannot afford to load them into memory. One solution would be a BlockingQueue approach.

It's a consumer-producer approach where producer read the file and place in the queue until it's full. Consumer thread is blocked until the queue is non empty.

Queue is created using BlockingQueue interface.

ExecutorService is used to create a consumer thread pool and producer thread pool.

java.util.concurrent.ExecutorService consumer = java.util.concurrent.Executors.newFixedThreadPool(CONSUMERS_COUNT);