How to Prepare Real Estate Data for AI

BLOG

How to Get Your Real Estate Data AI-Ready

Unlock the power of AI by mastering your real estate data preparation.

In today’s data-driven world, businesses want to leverage analytics and AI to drive decision-making and innovation. This is especially true within the financial services sector, which has seen change driven by regulation and competition. At SkenarioLabs, we increasingly see how this impacts our area of expertise – real estate.

Banks and lenders face a myriad of challenges concerning their mortgage books, commercial lending collaterals and securitised debt products. There is a drive to understand the buildings which underpin them – be that for carbon accounting, climate resilience, green financing or risk management.

However, the journey to meaningful insights begins long before data is analysed. It starts at the very foundation of data handling, encompassing how we collect, transform and store it. While the allure of Artificial Intelligence and Machine Learning often steals the spotlight, the transformation stage is where significant value is added – ensuring that the subsequent analytics are not just feasible but also impactful.

Understanding the Data Value Chain in Real Estate

A Data Value Chain is a sequence of processes and activities that transform raw data into meaningful insights, emphasising the creation, management, and utilisation of data throughout its lifecycle to generate value for an organisation.

A basic outline can be seen below:

Data Collection
Data Transformation
Data Value Add
Data Impact
Data is acquired or created by something or someone.

Often, a lot of the data we use does not come from customers; we enrich it with other sources.
Data is then combined and transformed into a useful form.

The amount of effort required to achieve this tends to be underestimated. This involves data validation and structuring.
Data is then analysed using a combination of technologies (e.g. algorithms, ML, AI, etc.).

This tends to be the stage which attracts the most excitement.
Data is then used by humans and/or machines to make decisions that impact the issue that people are trying to solve.

It is crucial to be mindful of usability as a key outcome.

Each stage of the data value chain builds upon the previous one. Inefficiencies or errors in one stage can affect the entire process, making it crucial to ensure quality and integrity at every phase to derive maximum value from the data.

In future blogs, we will look into the other stages, but today’s focus is on Transformation, as it is commonly misunderstood and/or left as an afterthought.

How Can Organisations Overcome Data Quality Challenges in Real Estate?

Most organisations find themselves saddled with data that is incomplete, inaccurate, and scattered across disparate structures. Such data complexities are more than mere nuisances—they are formidable barriers to deploying advanced analytics and AI applications that can revolutionise business operations. This is where the art of data transformation becomes critical.

Many addresses we see from clients of all shapes and sizes do not reliably point to a single asset in a way that machines can reliably and repeatedly interpret.

For example, a bank may have a portfolio of asset-backed commercial loans. An example of a typical address we might see is:

Collateral ID
Street Address
Town/City
Postcode
County
Outstanding Loan
Property Value
Property Type
1704
21-25 & 31 High Street
Huddersfield
HD1 2NE
West Yorkshire
£1,304,401
£1,600,000
Mixed Residential & Commercial

*Please note that the data presented above is not real and intended solely for illustrative purposes.

A human would reasonably assume that this refers to 21, 23, 25 and 31 High Street and likely includes multiple units within each building.

An AI system focused mainly on analysis (such as valuation) would not consistently find every asset. Even a geo-location model would not be able to parse these things well – they are likely to under-capture as below:

Street Address
Town/City
Postcode
21 High Street
Huddersfield
HD1 2NE
25 High Street
Huddersfield
HD1 2NE
31 High Street
Huddersfield
HD1 2NE

Alternatively, they might over-capture:

Street Address
Town/City
Postcode
21 High Street
Huddersfield
HD1 2NE
22 High Street
Huddersfield
HD1 2NE
23 High Street
Huddersfield
HD1 2NE
24 High Street
Huddersfield
HD1 2NE
25 High Street
Huddersfield
HD1 2NE
31 High Street
Huddersfield
HD1 2NE

When dealing with large datasets of tens, or hundreds of thousands of assets this is a total blocker to any practical usage of AI. When combined with other common and more complex address formats, such as “Unit 1/ 5 Trading Estate, Town”, “Ben’s Flowers, City”, or even “All Property South of Main Road, Village”, the input data creates a “garbage in/garbage out” effect. Once this has happened, it is almost impossible to sweep through every record line-by-line to find all the errors.

Why is Data Transformation Critical within Real Estate?

Data transformation refines data through merging, cleaning, and structuring. At SkenarioLabs, we cross-validate datasets against one another to enhance reliability and ensure alignment with the so-called “ground truth” data and industry benchmarks.

By rigorously comparing and contrasting data sets, we can identify inconsistencies and rectify them, thereby elevating the quality of the data at hand. Our highly focused approach to Machine Learning spots and adapts to patterns within specific data types. Meanwhile, “general AI” approaches that aim to simulate a human’s ability to achieve a wide variety of “general knowledge” activities tend not to be able to get in-depth on technical issues. This is something you might see if you ask ChatGPT to help you with a maths problem!

In the case of the example data above, our approach would be to first parse the data with Natural Language Processing, which interprets address information before analysing it. Next, the data is crosschecked with other available sources, including energy performance, postal or taxation data. With this example, if 21, 23, 25 and 31 High Street are all commercial properties, we would know from the Property Type that we should also find some residential units at the addresses. As such we can ensure that our algorithm looks for addresses which make sense in that context.

In this case, we might find:

Street Address
Town/City
Postcode
Property Type
21 High Street
Huddersfield
HD1 2NE
Retail
21a High Street
Huddersfield
HD1 2NE
Flat
21b High Street
Huddersfield
HD1 2NE
Flat
23 High Street
Huddersfield
HD1 2NE
Office
25 High Street
Huddersfield
HD1 2NE
Retail
25a High Street
Huddersfield
HD1 2NE
Flat
31 High Street
Huddersfield
HD1 2NE
Office

*Please note that the data presented above is not real and intended solely for illustrative purposes.

We can then link to other datasets via Unique Property Reference Numbers (UPRNs), building polygons, Energy Performance Certificates and other real estate data depending on the client’s use case. This might be EPC score and floor area for carbon accounting, energy modelling and cost data for green financing, or local sales data for valuation.

With this additional linked data, we also look at cross-validation steps, such as:

  • Does the valuation of the properties we have found roughly align with the client’s provided number?
    • If not, is it an error on our side or a risk to flag to the client?
  • If we look at the primary addresses linked to the building polygons, do they match the address we have identified?
    • Does it match the size and expected location?
    • If not, this will then re-route the search to find properties which might fit better.

N.B. – building polygons show the exact shape, size and location of a building.

This emphasis on accuracy in the earlier parts of the value chain ensures that our clients are equipped with clean, organised data that seamlessly feed into AI algorithms and analytics tools. In turn, unlocking new opportunities and insights that were previously obscured by data noise. This transformation stage is crucial—it’s where data starts to build its narrative, setting the stage for all the advanced analytics to play out effectively.

As we look towards a future where data becomes even more integral to business strategy, the importance of cleansing and transforming data cannot be overstated. By ensuring high-quality data input, we not only improve the output but also enhance the entire data value chain.

If you have big ambitions, but need better data to back them up – get in touch with us!

Improve your data quality today

Transform your real estate data and enhance decision-making.

Discover the power of data

Learn more about how we can help you manage your properties.