What is Data Management?
Data Management is the process of controlling the information generated during a research project, including the storage, access and preservation of data throughout the research life cycle and beyond. Any research project will involve some level of data management; the outcome of the research depends in part on how well this data is managed.
Many federal funding agencies are now starting to require formalized data management plans. Regardless of funding, a written plan should be in place for all research projects, and shared with all key personnel involved with the project.
Effective Data Management Practices include:
- Designating the responsibilities of every individual involved in the study,
- Determining how data will be stored and backed up, including long-term archiving,
- Implementing the data management plan, and
- Deciding how data will be dealt with through each modification of the study.
General Roles and Responsibilities for Data Management
- Principal Investigator (PI): The primary owner of the data. The PI is responsible for identifying an information custodian, developing a written data management plan, enacting processes necessary to confirm compliance with the plan, and for ensuring data is retained and shared according to sponsor and university requirements.
- Colleges/Departments/Centers: Responsible for providing the necessary resources for data management, addressing related information security issues and ensuring investigator compliance with data management requirements and university policy, such as MAPP 10.05.03, Data Classification and Protection, and MAPP 10.03.06, College/Division Responsibilities for Information Technology Resources.
- Division of Research (DOR): Responsible for the development of and review of compliance concerns related to a campus-wide policy for data management, particularly with regard to compliance with federal grant requirements and sponsored project agreements. A DOR Data Management policy is currently under development.
Data Management Tools and Best Practices
Developing a data management plan does not have to be complicated. The DMP Tool website has numerous templates by discipline for creating data management plans.
External Resources for Data Management Plans:
Research data are a valuable resource, usually requiring much time and money to be produced. Many data have a significant value beyond usage for the original research.
In addition, data sharing:
- is required by government funding agencies (e.g. NSF, NIH, NASA) and some publishers
- allows data to be used to answer new questions, which promotes the possibilities of new inventions and discoveries
- maximizes transparency and accountability
- promotes the community of sciences and collaboration, which allows science to be more open
- makes your papers more useful and citable by other scientists
How to share your data:
- Deposit in an appropriate data repository or archive
- Post online via a project or institutional web site
- Submit data to a journal article
Federal Agency Data Management and Sharing resources:
- NSF: Dissemnation and Sharing of Research Results
- NIH Data Sharing Policy
- NIH Data Sharing Policy and Implementation Guidance
- CDC: Standards to Facilitate Data Sharing and Use of Surveillance Data for Public Health Action
A Word about Metadata
The word "metadata" means "data about data." It gives context to your research data by providing descriptive detail about it. It articulates a context for objects of interest -- "resources" such as MP3 files, library books, or satellite images -- in the form of "resource descriptions.” It encompasses the following:
- names, labels and descriptions for variables, records and their values
- explanation of codes and classification schemes used
- codes of, and reasons for, missing values
- derived data created after collection, with code, algorithm or command file used to create them
- weighting and grossing variables created and how they should be used
- data listing with descriptions for cases, individuals or items studied, for example for logging qualitative interviews
For further information on metadata and an example of a metadata schema, see the Dublin Core metadata schema.
See also a selection of general and discipline-specific metadata standards.
What Information Technology Security resources are available on campus?
The UH Data Classification and Protection Policy is MAPP 10.05.03 (PDF).
Members of the IT Security Team are available before, during, and after the conduct of your research project to advise investigators and departments/colleges/centers on proper measures to protect valuable data.
Consulting with IT Security as you are writing your proposal is the ideal time to ensure these protections are put in place and considered when building the research budget.
Best Practices Data Storage/Archiving
- Data must be archived in a controlled, secure environment in a way that safeguards the primary data, observations, or recordings. The archive must be accessible by scholars analyzing the data, and available to collaborators or others who have rights of access. Primary research data should be stored securely for sufficient time following publication, analysis, or termination of the project. The number of years that data should be retained varies from field to field and may depend on the nature of the data and the research.
- Sustainable data management is crucial to the value of research and crucial to ensuring continued scholarship. Typically, in data storage, there is an access copy, for use, and an archival copy, essentially for preservation and back-up purposes. Backing up data cannot be overemphasized, just as natural disasters and breakdowns in systems and software cannot be predicted. Back up your data early and often.
- Choosing data formats and software depends mostly on the preference of the researcher but can often be dictated by discipline-specific standards and customs. While ensuring the long-term usability and sustainability of data requires attention to standard and interchangeable software, there are also Preferred Formats (from the UK Data Archive) for data creation and preservation.
- For more information about selecting data formats and software with respect to sustainability, see "Sustainable Data Formats" (University of Wisconsin-Madison).
- Long-Term Data Storage: Close attention to storage, back-up, security, and sustainability of your data means you lessen the risks of compromising their quality and accessibility over the long term. Issues related to storage include considering how rapidly data are expected to increase over the lifetime of the research project. Part of answering this question involves determining whether data will be collected in automated ways, which potentially steps up the scale of data collection, or whether staff on the project will be gathering data themselves (e.g., via inputting in a database, or a lab notebook). Options for short-term storage include hard disk drives and portable media (e.g., DVDs and CDs).
What is UH’s Policy on Data Retention?
Access to and the retention of data policy can be found on the Division of Research website.