A response to CityLab article: Should Libraries Be the Keepers of Their Cities’ Public Data?

Libraries have lots of reasons to be excited about the possibility of hosting open data in their communities. Libraries are trusted stewards of information, and librarians have considerable expertise in managing data and protecting privacy. Libraries are leading many digital inclusion and data literacy efforts in their communities. As the priorities of elected officials change, having a third-party entity like a library host data from government organizations and other partners can also help to sustain community data initiatives. Librarians also have the skills to provide data management and data hosting services to organizations that may not otherwise have internal capacity to treat their data as an asset. There are many reasons why libraries hosting open data seems like a natural fit.

So… your library is thinking about hosting open data.

Great! We’d love to see more libraries playing this role. At the same time, there’s a lot to consider. There is great work that has explored some of these issues already – see Temple University Library’s project report: Future Proofing Civic Data.

Here in Pittsburgh, members of the Civic Switchboard team have gained some unique insight from our involvement with the Western Pennsylvania Regional Data Center (WPRDC), a regional open data intermediary. WPRDC manages a data repository, provides services to both data publishers and data users, and builds tools that help people in our community extract value from open civic data. We’ve learned there’s a lot more to think through than just the literal hosting of data.

We’ve developed a number of questions that we think libraries should ask themselves when thinking about hosting open civic data from others in their community.

1. Why is your library doing this work?

The motivation for why the library is interested in this work will impact the decisions you’ll make about funding, technology, staffing, and the underlying structure of your efforts. Whatever action you take, it’s likely to involve multiple people from your library working together over a period of time. For these reasons, we encourage you to be sure to record and share your goals and motivations for your efforts within your institution and even the broader community.

Need some example motivations? Check out the Building Libraries into Civic Data Partnerships and Library Role Typology sections of the Civic Switchboard Guide.

2. What are the data needs of people in your community?

It’s vital that your library have a sense of the data needs of people in your community. This will help identify the features that are important when it comes to creating a data infrastructure, identifying training needs, designing support services, and even uncovering the types of data priorities that can give focus to your work. The best way to learn about these needs is through direct engagement with people in your community. Consider interviews, focus groups, surveys, documenting data and technical assistance requests, and even observing people as they interact with data and technology in your library. You will learn more about data and statistical literacies, technical skills, types of devices and software used, ways people access the internet, and the overall personal and community context for this work. In the Civic Switchboard guide, we’ll be adding resources for librarians looking to learn about the data needs of people in their community.

3. What does your community’s civic data ecosystem look like?

We often use the phrase “civic data ecosystems” to mean the people and organizations connected with data in a local region, as well as the infrastructure that supports this work. The Civic Switchboard toolkit provides resources for identifying and characterizing the relationships between different actors in your civic data ecosystem. There are several different approaches for mapping your local ecosystem, and we’ve used the techniques in our Civic Switchboard workshops in 2018 and also in our work in Pittsburgh. The map and mapping process are great ways to frame conversations about your local ecosystem. The map can expose gaps and opportunities in your local ecosystem, and can inform the investment decisions your library and other actors might make in your local civic data infrastructure.

4. How will your library support data publishers?

Here we assume that your library is interested in hosting data from the broader community as part of a federated structure. When considering data publishing partners, take stock of the trust you’ve been able to build with them, their capacity to produce accurate, formatted data, and their ability to provide automated data feeds. Also consider what your library may need to do to help your partners prepare data for publication, including assembling metadata and data dictionaries. You will also need to develop data sharing agreements that outline rights and responsibilities of each party in the data sharing relationship.

If that seems like a lot (you’re right - it’s a huge undertaking!), dipping a toe in the water by sharing available non-sensitive public information about your community can be a great way to build interest. Consider existing sources like the U.S. Census Bureau, or data about the library itself; these could be shared through downloadable files posted to the library’s website. An incremental approach can help your institution build connections in your civic data ecosystem, learn more about community needs, and build staff capacity to work with data while minimizing up-front expenses.

5. How will your library mitigate risk?

Is your library willing to take on the risk of sensitive data accidentally being made available through your data sharing initiative? Your library will need policy and workflows that minimize these risks.

Your library can minimize risk by asking publishers not to share sensitive data with your organization, lessening the possibility of releasing the data in the open data portal. You’ll also likely want an internal publishing workflow requiring review of new datasets prior to publication to ensure nothing harmful is published. Document these policies and practices through a data management plan.

Another form of risk mitigation is to help publishers improve their internal data management processes. Consider establishing a set of guidelines for your publishing partners to reduce the risk of improper data sharing. Some organizations may already have these and describe them as “data governance frameworks”; they are used to improve the quality of information as an organizational asset, improve data security, protect sensitive information, adhere to regulations and policies, and offer guidance in assessing the risks of sharing data.

6. What is your library’s capacity to manage data infrastructure?

If your library has the ability to manage data repository software, then you can play a direct role in managing an open data portal. If library staff also have the ability to manipulate data and use APIs, then you can further establish automated publishing processes that will supply users with constantly updated data. However, if your library can’t do this in house, it shouldn’t stop you from hosting data. You will need to rely on a combination of vendors, consultants, and staff at partner organizations to provide technical capacity.

Alternatively, your library could build an initial infrastructure using existing server infrastructure and simple protocols like FTP, as the City of Albuquerque, New Mexico has done to support its open data program.

Or, your library might build a prototype program using cloud-based data storage services like Dropbox, Box, or Google Drive to host data.

Taking an incremental route may be a great way to test the waters and learn more about your community’s data users without (or before) making a major investment in technology infrastructure. If your library decides to take this incremental approach, you will need to manage the expectations of data users who may be looking for real-time data and more-robust user interfaces. You’ll also need to be aware of the risks of locking-in your data management processes with any temporary infrastructure.

7. What support is your library willing to put behind a data sharing initiative?

Data hosting requires dedicated resources, and if you want to institutionalize a role for your library, you’ll eventually need buy-in from your library’s leadership. (We have shared some strategies for building buy-in in our guide.)

More than the data sharing platform, people are the most-important part in a community’s data sharing infrastructure. When you’re thinking about resources remember people as well as technology. You don’t have do do all of this at once, but consider what people or positions your library may need to fill roles like these:

  • building and strengthening relationships with data users and publishers
  • fundraising
  • transforming data
  • managing a data repository
  • developing data publishing policies and governance
  • building digital tools to help people access data more-easily
  • training data publishers and data users
  • providing technical assistance to publishers and users
  • organizing community engagement activities
  • maintaining long-term accessibility to data

Your library or partners in other organizations (including other libraries) may already be providing some of these services, so you may not need to secure new funding to get your effort started. Mapping your ecosystem will be very helpful in identifying service gaps and uncovering collaboration opportunities.

8. How will your library select your data sharing platform?

Your library doesn’t need to have answers to all of the previous questions before establishing your local data sharing partnership, but we strongly encourage you to work through them before making any substantial investments in technology. Making premature investments could lock you into a product that does not meet user needs, or doesn’t allow your initiative to evolve with the needs of your community.
When your library is ready to consider an investment in technology, we encourage you to use a deliberative, inclusive process to select the software platform that will power your community’s data sharing infrastructure. A software selection advisory committee can include data users, potential publishing partners, software experts, potential funders, and other people within their ecosystem. Your library can work with the committee to develop criteria to guide the evaluation of options and determine how to work within the library’s software procurement processes. The committee should consider installation and startup costs, operating costs, vendor and technology reliability, usability, and fit.

Learn about the process we used here in Pittsburgh.

To recap:

If you’ve made it this far, you must be really excited about the possibility of your library being involved in building a local data sharing infrastructure!

  • Your initiative will be most-successful if it takes into account the needs and aspirations of your data users, the structure of your civic data ecosystem, and your library’s motivation and capacity to be involved in civic data.
  • Taking an incremental approach to building your data sharing initiative can be a great way to build partnerships and answer the questions you’ll need to answer before your library or community invests in a data infrastructure.

Check out the Civic Switchboard Guide for more on building libraries into civic data partnerships.
We’d love your feedback - write to us at civic-switchboard@pitt.edu or start a discussion on the Civic Data Operators group.