A Statistical Approach to Finding Your Optimal Apartment

Jake Haines
Level Up Coding
Published in
5 min readJun 11, 2022

--

Photo by Brandon Griggs on Unsplash

Have you ever been apartment searching, and had a large, comprehensive list of places you were looking at? It can get overwhelming — and is often hard to decide what your best option is because you feel ambivalent towards most if not all apartments on your list (this is especially true in the Bay Area).

I ran into this exact issue in my search, which is why I created a somewhat unconventional method to help point you in the right direction when deciding on a place.

This approach assumes the following:

  • You have a list of places you found via search on platforms like Airbnb, Apartments.com, Zillow, etc.
  • You have the following data on those places:
    - Monthly rent (optionally with any additional fees, such as utilities)
    - Unit floor area (in sq. ft)
    - Rating on your overall feeling about the physical unit (excluding floor area) on 1 to 5 scale*

* In rating each apartment, consider the following: your feelings on safety, management, neighbors, quality of building, etc. All factors should be taken into account. This rating is subjective, which is a quantitative limitation. However, your decision on a place to live is also subjective. This approach aims to mitigate the amount of decision making to be made.

Method

The method used to quantitatively measure the optimal apartment(s) involves quite a bit of normalization, weighting, and ranking. I did all my calculations in Google Sheets, but I’ll walk through the algorithm here.

Let’s define vectors a (floor area), b (rent), c (your rating), and T as follows:

Each component contained in a, b, and c are the parameters corresponding to each apartment/place you have listed. T is simply a collection of those vectors.

Let us rank components within the vectors in T defined by R:

You will notice rank is being used in columns with a -1 exponent. This is not a -1 exponent, but rather corresponds to inverse. In our case of ranking, an inverse rank corresponds to ranking in descending order. Using descending order for certain parameters allows us to rank those parameters practically. If the “better” values of b are logically higher, then the “better” values of a and c should be lower.

With arbitrary vector u, we define the average function as:

n is the number of components in u

and use the average function to calculate the averages of row vectors in R,

thereby giving us a single column containing the average of area, rent, and personal rating rankings, for each apartment.

Next, we normalize the values in a, b, and c according to the normal distribution, using the function f(T) with t being each component of T. That is, t represents each vector containing values for the respective apartment data.

Using the normalized data and discrete weights, we can calculate the average of weighted normalized values for each apartment. You should assign values to these weights depending on how important the respective data is. For example, if floor area of the apartment is not as important, you set weight1 for a = 0.5. Additionally, regardless of how much weight you assign on rent (b), rent should always have a negative weight. This is because higher rent does not mean better, unless for some reason, you want to spend more on an apartment. If the latter is the case, the magnitude of your assigned weight should be smaller.

Note: weight 1 of a is the discrete weight value corresponding to a, which applies to weight 1 of b and so forth. f(T) is technically a matrix, so we can reference its elements directly.

It should also be significantly noted that your weights should move the values as close to the same decimal position as possible. For example, if all your values of b are all in the thousands and your values of a are in the tens, either set the weight of b to 0.01 or set the weight of a to 100. Same would be said for c. To reiterate, each “column” should contain values that fall near or within the same decimal place of each other. If you do not do this, your normalized values later on will be incredibly skewed and thus any averages will be skewed as well.

We define a measure score1 as follows, with the weighting rules from above applying here as well:

See note regarding the weights under the average of weighted normalized values definition.

Now, using the previous derivations, we can calculate a vector giving us a single score derived from all three original parameters: area, rent, and personal rating. Again, we need to use the same weighting rules mentioned earlier.

Then, using that derived score, we again normalize the components in the vector using function f from earlier, this time in terms of a single arbitrary vector u.

Lastly, we use the normalized values of score2 to derive a vector containing the final ranking of apartments:

Conclusion

The resulting ranking should give you a good idea of what apartments to prioritize, however that may be. Maybe it’s picking which apartments to tour in-person, or maybe it’s choosing which apartments are important enough to check for available listings daily. It’s your choice how you use those rankings, but keep in mind those rankings are based off of the weights you assigned, and thus how important the parameters area, rent, and personal rating are.

Hope you enjoyed this nifty math article, and could make use of this! Let me know what you think, and feel free to reach out should you have any questions.

Level Up Coding

Thanks for being a part of our community! More content in the Level Up Coding publication.
Follow: Twitter, LinkedIn, Newsletter
Level Up is transforming tech recruiting ➡️ Join our talent collective

--

--