There was a requirement to gather house sales and rental data from the housing market for a certain area in the UK using sites such as Zoopla and Rightmove. When looking into Rightmove, no documentation could be found on their API’s, however, when viewing the page source and inspecting the network calls Rightmove made to get the data displayed on their site, it could be seen that a publicly accessible API endpoint is used to retrieve the data:
When first seeing this, the initial thought was this is going to be easy. All that would need to be done to retrieve the results is to parameterize the URL, loop through a list of postcodes, and call the endpoint to retrieve the data. However, when inspecting the URL more closely, the Postcodes had been swapped out with Rightmove’s own location identifier as highlighted below:
Retrieving Rightmove’s identifier for a postcode
Initially, it was thought the search bar of Rightmove’s site would return the location identifier in the response when searching, however, when inserting the postcode in and inspecting the network traffic, it could be seen this was not the case. The typeahead splits the postcode into chunks that are two characters each and separated by a forward slash – only the region location identifier is returned as shown below:
The full code solution can be found here.
I’m not sure if this is the most elegant solution, but one way of solving the problem and based on past tools used was to use Selenium to simulate a user performing the actions of:
- Going to the Rightmove site
- Searching for a Postcode
- Taking the URL from the address and pulling out Rightmove’s location identifier for the searched postcode
Initially the application was written as a single console application to read a CSV file of postcodes and iterate over each postcode performing the above steps and storing the Postcode and location identifier as a Dictionary<string, string>. It was quickly noted that this process was really slow, therefore the application was split in to two console applications:
PostcodePopulator.Console is a simple console application that reads a CSV containing postcodes for London, and pushes the postcode on to a queue within RabbitMQ.
PostcodeProcessor.Console is a console application that listens to the RabbitMQ queue, and uses Selenium to perform the steps mentioned above, storing the postcode, Rightmove location identifier, and the processing status in a table in SQL Server.
Running the solution
You have two options for running the application, the first is running the console applications using dotnet run, you’ll need to ensure you have RabbitMQ and SQL Server running, and will need to update the appSettings.json file to contain the RabbitMQ address and update the connection string for SQL Server.
The second option is to use Docker. Dockerfiles have been included for both console applications, and one for the database (src\PostcodeProcessor\PostcodeProcessor.Console\ Dockerfile.database.dockerfile ) along with docker-compose/docker-compose.override files for spinning up the application containers, the RabbitMQ with management container and the SQL Server container which also contains an initialisation script to create the database and the table to store the results in from the PostcodeProcessor.Console application. The PostcodeProcessor.Console Dockerfile also shows how to download all required dependencies to run Selenium inside a container.
Previously it was mentioned that processing each postcode was quite a slow process. To speed this up locally, run the docker-compose command with the scale option. The below will run the ten instances of the PostcodeProcessor.Console application.
docker-compose up –scale postcodeprocessor-console=10
To view the results, log in to SQL Server using the connection string details provided in the appSettings or docker-compose.override files, and run a select on the [RightmoveDemo].[dbo].[PostcodeLocationMapper]table. Using the docker-compose approach to run the application, you should be able to log in to the SQL Server docker container using the address 127.0.0.1, 5433, username sa and password Pass@word
This should give you everything you need to now call the Rightmove API end point with the location identifiers for the postcodes you are interested in e.g:
” https://www.rightmove.co.uk/api/_search?locationIdentifier=POSTCODE%5E1149959&numberOfPropertiesPerPage=24&radius=0.5&sortType=2&index=0&includeSSTC=false&viewType=LIST&channel=BUY&areaSizeUnit=sqft¤cyCode=GBP&isFetching=false&viewport= “