We’re excited to share that the Open Repair Alliance has published our first joined-up set of open repair data. We have mapped and combined data from each of our partners into one aggregated dataset. This is nearly 30,000 records of data on repairs undertaken at community events.
It is all open data and available for download now.
We encourage anyone who is interested to have a look and to share your feedback and discoveries with us.
Read on to learn why and how we’re collecting and combining this data, and what our upcoming plans are.
Why are we collecting and combining repair data?
We know that citizen data on environmental issues can have direct influence at the policy level. In light of this, the Open Repair Alliance came together over two years ago to collaborate on an open repair data standard to share our data. Now, partners are all using tools with which we record our repair data electronically. By mapping this data to a common format, we can pool our repair data together and look for patterns and trends to help inform policy.
As an example, our first dive into the newly-combined data was at an event at Fixfest 2019 in Berlin. During the event around 20 volunteers investigated why the computers brought to community repair events have failed. Since then, ORA-partner The Restart Project has taken this work further into an online microtask for categorising faults. Find out more about FaultCat.
We expect there to be an opportunity to influence regulations of design for repairability for laptop computers in 2020, with open repair data playing a role in that.
Where does the repair data come from?
Publication of the dataset was made possible by contributions from ORA partners, all of whom are collecting repair data at their community repair events – mostly in Germany, Holland, the UK and the US. Along with the combined set, more details on all of the individual data sources can be found on the data downloads page.
How did we combine the data?
We went through a number of steps of translation, cleaning, reformatting and mapping of each of the partner datasets to enable the data to be combined.
The two main fields requiring mapping are product category and repair status. Each partner currently collects these using their own lists of categories/statuses. For the next version of ORDS, we are investigating a set of product categories and statuses that can meet the needs of our desired uses of the data.
This initial work has provided a great starting point for ongoing data aggregation. It has highlighted a number of areas requiring further work in order to enable regular aggregation, and areas to improve the quality and usefulness of the shared data.
Changes to the Open Repair Data Standard
This initial aggregation has provided further insight into changes required for the next version of the Open Repair Data Standard. In short, we’ve concluded that we should add to ORDS a number of fields, including those related to language. Additionally, we need to undertaken more analysis on the product category and repair status fields in order to produce consistent mapping.
More details on these suggested changes can be found here: Open Repair Data Aggregation for FixFest 2019.
Support for publishing repair data regularly
Partners are collecting repair data from their community repair events all the time. We want to combine these and publish regularly updated Open Repair data downloads.
Our initial work involved a lot of manual mapping, and showed us that we will need improved tool support to publish on a regular basis. This could mean partner members adapting their existing data recording tools to export directly to the ORDS format. Alternatively, we could produce tools that can map to ORDS from various formats.
Improving data quality
By combining the data, we have seen that the quality of the data collected by partners needs to be improved. While some of the core information is abundant and is useful for reporting purposes (e.g. date, product category, repair status), some other fields are much less consistent (e.g. problem, model, year_of_manufacture).
|field = value||% of total|
|repair_status = “Unknown”||4.06%|
|product_category = “Misc”||10.17%|
|problem = ”” (empty/blank)||19.92%|
|brand = “Unknown”||55.85%|
|model = “Unknown”||72.35%|
|year_of_manufacturer = “????”||92.32%|
This is not surprising, given the nature of where we collect data: busy, community events where the main focus is the fixing itself. That said, work on improving data quality is an ongoing focus for us. Partners are investigating improved data collection tools. We are also working on increasing overall community engagement and understanding of repair data by making the data easily accessible. Online microtasking tools such as The Restart Project’s FaultCat are one example of this.
As we analyse the repair data further, we aim to publish more reports on our findings to the Insights page.
We encourage members of the repair community to get involved in this, too. So why not download the data and let us know what you find!