Understanding Monoids using real life examples
You will find a lot of articles over the internet which delve deep into the world of monoids, running into scores of pages hoping to simplify the concept. This is an attempt in a very different direction; we will use an easily relatable business problem and try to model it using monoids. The focus will be on intent and benefits that accrue rather than the mathematical intricacies.
Problem Statement
Let’s say, there is a cab aggregator company by the name CoolTaxi, quite similar to the Ubers, Olas and Grab taxis of the world. In their early days, they decide to launch a simple app to enable users to hail taxis, based solely on their current location and destination. Since it’s an MVP the two person company decides to write the minimal amount to code to roll this out.
Introducing filters
So far so good. This works very well in the initial few months of the company’s launch. The company does very well and within a year it has investors knocking on its door. The founders take the bait and decide to forgo some of their shares in lieu of getting funds for investment. The new investors are keen to guide the company and suggest that CoolTaxi offer services to cater to different price points and demographics; they name the new services as Basic, Pool and Premium. The code now needs a bit of enhancement and after the refactoring looks like this
Expanding the problem statement
Let’s expand the problem statement.
- The company is flush with cash from the VCs and is aggressively pursuing new customers by offering discounts.
- To ensure optimal user experience, ride pick up distance and drop time are also factored in.
- Now the company has begun to grow and there is lot of focus on specific segments — Daily Commuters, Airport Goers, Luxury Travellers, Discount Seekers, etc.
- A smart Product Manager decides that he wants to introduce more complex services on top of existing ones to cater to more and more micro segments. He comes up with fancy name like OffersOnWheels and ShareMyCab which are nothing but even more sparser projections using a combination of existing filters.
Suddenly we have a huge bunch of filters driving the ride matching algorithm; each filter takes a list of candidate cabs and returns back a much smaller list of cabs that fulfil the service criteria. We also more abstract filters stacked on top of exiting ones
Pause and Re-think
You can very easily anticipate that the code is going to get quite a bit complex and bloat up. The SmartOccupancyManager and the related classes can potentially run into hundreds of lines if we continue coding with our current approach. We can see there are following design problems.
1. The core of the ride matching algorithm is a large number of small filters that perform candidate selection based on invariants described in the request. For e.g. matchRoutes, matchPromotions, matchRatings, etc
2. A smaller number of composites that evaluate into specific service offerings. For e.g. matchBasicRide, matchPooledRide, matchPremierRide, etc that support Basic, Pool and Premium rides
3. Another smaller number of abstractions that rely on the underlying constructs to help build a more coarse grained products. For e.g. MyDailyCommute, OffersOnWheels, etc, which are built on top of low level filters.
4. There is a need to evolve the abstractions and their respective interpretations independently without any client having to be aware of it. To elaborate further, the business service, SmartOccupancyManager, that seeks to retrieve a collection of cabs based on a product offering, needs to be decoupled from the process that determines how that product offering(an abstraction) needs to be evaluated.
Compositions at the heart of design
There are many ways to build such a system including the object oriented way. Let’s take the functional approach and start building this out step by step
- Let’s start by redesigning each filter to take only one input and produce only one output of the same type, a list of cabs in this case. Mathematically speaking, these filters can be said to exhibit endomorphism. Now, why would be want to do that? So that we can pipe filters one after another, which we cannot if the input and output signatures don’t match.
- To achieve this, all the original methods that “performed” filtering, now take varying inputs but always return a function “that takes as input a collection of cabs and returns a collection of cabs as output”. So just to be clear, for each method, input is (criteria for filtering) and output is a function that performs filtering based on criteria. For e.g. take a look at method matchClass in the code snippet below.
def matchClass(typName: String): List[Cab] = {
cabs.filter(_.typ.equalsIgnoreCase(typName))
}
is now refactored to/* represents a filter function that take a collection of cabs, filters and returns the result */
type Filter = List[Cab] => List[Cab]def matchClass(typName: String): Filter = {
cabs: List[Cab] => cabs.filter(_.typ.equalsIgnoreCase(typName))
}
3. What is the outcome of the exercise so far? If you observe closely, you will realise that the output of any filter can be fed as input to another filter, letting us chain a series of filters one after other at run time if we so wish to. But are we going to settle for this? Not entirely.
matchSeats(2,(matchClass('pool',(matchRoutes(start, end,(cabs)))
4. Now that we have developed the building blocks for most fine grained filtering operations let us proceed further and develop some coarse grained operations on top of them. So, a question arises, how do we compose a top level filter that combines the effects of two or more filters? Quite naturally we need a combiner (operation) that lets us fuse two or more filters into a single composite filter. The client can they invoke this composite filter without knowing how it was constructed and what it was constituted with! For e.g., a client can invoke a convenienceFilter without knowing that the notion of convenience means a composite filer based on distance to pick up and time to drop. For e.g.
pooledCabsFilter = matchRoutesFilter COMBINE matchClassFilterconvenienceFilter = distanceToPickUpFilter COMBINE timeToDropFilter
5. We can write a combiner that, given two such filters, can collapse them into a composite filter with the same signature. This isn’t difficult to imagine. After all, we are merely fusing the business logic split across two filters into one.
//pseudocode
combine(f: Filter, g:Filter): Filter = {
create a new function h that takes as input a list of cabs
applies filter f and produces a partial result r1
applies filter g and produces a complete result r2
returns h
}
//basically h = f * g (mathematically speaking)
Now scala gives you such a combiner already that lets you perform this fusion and its called andThen operator. For e.g. we can create a composite filter for pooledCabs
def matchPool(start:Source, end: Destination)(seats: Int) = matchRoutes(start, end) andThen matchClass("pool") andThen matchSeats(seats)
6. We can notice, rather intuitively, that the process of recursively combining the filters exhibits a property, wherein the way in which the filters or composites are collapsed has no effect on the result, so long as the sequence of operations doesn’t change. This is called as associativity.
For e.g (matchSeats + combine(matchClass + matchRoutes))
is same as
combine(combine(matchSeats + matchClass) + matchRoutes)
7. How is associativity useful?
Imagine you had, a large collection of functions to be reduced using a combiner (+) into a single function
e.g. f1 (+) f2 (+) f3 (+) ..... (+) fnWith associativity,
- for a client that is not aware of the internals of the functions, we get a deterministic behaviour when we combine functions
since ==> f1 (+) (f2 (+) f3) is same as (f1 (+) f2) (+) f3
Otherwise, the client will have to be aware of the internal logic of all the functions and the correct way of combining them!- it makes it possible to parallelise the reduction of such large sets of functions so long as the sequence is maintained while performing recursive combinations.
For e.g. computing sum of List(1,2,...n) can be parallelised by performing additions on partitioned chunks and recursively adding the partial results to arrive at a final result. This is possible due to associativity of the sum operation.
8. So, finally, what are monoids? In our case, the following constitute a monoid
1. A set of filter functions that exhibit endomorphism - a single input and as a single output of same type i.e. set of cabs2. A combiner that fuses two such filters into a single filter while adhering to following rules
- Associativity: the combination of filters is associative
- Existence of an identity filter that when combined with any filter yields the same filter. In our case, imagine a pass through filter which does nothing!
def passThrough: Filter = cabs: List[Cab] => cabs
9. Now that we understand what are monoids, why are they useful? Because they provide composability which is very fundamental to functional way of thinking and building software. With the new understanding we now start constructing the coarse grained functions and even higher level abstractions that we wanted to wrap around these functions. You can see how we have composed the abstraction matchOffersOnWheels from (matchPool and matchPromotionCabs) both of which have been crafted from low level filters such as (matchRoutes, matchClass, matchSeats), etc. We have built a service that offers these higher level apis and hides the primitives from the client.
10. What did we accomplish through this whole exercise? Let me re-iterate again, the client SmartOccupancyManager has absolutely non idea about the internals of abstractions such matchOffersOnWheels or other coarse grained filters. It works with the high level apis offered by CabService. The abstractions and the underlying primitives can evolve independently. We have managed to compose higher level abstractions using lower level primitives. Further more, the process of reducing a set of filters into a single filter can be parallelised though it may not have a lot of practical benefits.
We will now create the client and the full blown code looks like this.
Complete code
I hope this post managed to shed some light on modelling business problems in functional programming using monoids.