Note: To understand the execution of files, please click here
A large multi-national retail chain has sales orders data across regions and different sales channels for a large variety of item types. The business team wants to use this data to analyze various aspects of sales - e.g. top selling items in a region, regions with maximum profit in a certain item type, if there is a significant difference in revenue in two item types across regions etc.
As the data analytics team, use the sales transaction data set with about 100K records to answer these questions below —
-
Average unit_price by country for a given item type in a certain year
-
Total units_sold by year for a given country and a given item type
-
Find the max and min units_sold in any order for each year by country for a given item type. Use a custom partitioner class instead of default hash based.
-
What are the top 10 order id for a given year by the total_profit
You have to show the above analysis working on a Hadoop system using map reduce code, preferably in Java or Python. You can do data preparation steps as required before running a MapReduce job to answer these questions above.
Average unit_price by country for a given item type in a certain year
-
mapper file: Average Unit Price Mapper
-
reducer file: Average Unit Price Reducer
-
Ouptut: Average Unit Price output
-
O/p format: (Country, Item_type, Year) Average Unit Price
Total units_sold by year for a given country and a given item type
-
mapper file: Total Units Sold Mapper
-
reducer file: Total Units Sold Reducer
-
Ouptut: Total Units Sold output
-
O/p format: (Country, Item_type, Year) Units Sold
Find the max and min units_sold in any order for each year by country for a given item type
- mapper file: Min-Max Units Sold Mapper
- reducer file: Min-Max Units Sold Reducer
- Ouptut: Min-Max Units Sold output
- O/p format: (Country, Item_type, Year) (min_units, max_units)
What are the top 10 order id for a given year by the total_profit
- mapper file: Top Orders Mapper
- reducer file: Top Orders Reducer
- Ouptut: Top Orders output
- O/p format: Year [ (Profit, OrderID1), (Profit, OrderID2), (Profit, OrderID3), ……..(Profit, OrderID10)]