Providing a brief overview of the data to be collected helps reviewers to understand the scope, nature and scale of the data that will be generated.
Example: This project will generate data describing political party organizational development in 15 countries.
What types of data are included?
All research data that were formally defined as "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings" by the U.S. Office of management and budget (1999).
Example: Examples of data include party membership size, gender balance of legislative delegations, national party income and expenditure patterns, and rules for selecting legislative candidates and party leaders.
Describe how you plan to create this data or capture it using software or equipment.
Example: Data will be uploaded into a spreadsheet from field notes and seismic measurement equipment.
Only answer this question if you will be using existing data.
Example: The data for this project will be culled from existing data but is difficult to obtain by standard channels.
2 What is a metadata standard?
A simple metadata schema contains fields for TITLE, CREATOR, DESCRIPTION, DATE, RESPOSITORY, SOURCE, USE & REPRODUCTION. It is highly advised to include more information than this in your metadata schema. See industry standards below.
Examples of existing metadata standards:
- Dublin Core Metadata Initiative – Metadata Basics.
- Metadata Research Center (MRC)
- Biodiversity Information Standards TDWG(a list of several metadata standards for science research data)
What metadata standard does the UH Libraries use?The library uses Dublin Core. See above for link to the Dublin Core website. It is perfectly acceptable to create your own metadata schema if you do not feel that any existing schema meets your needs.
A file format is the extension after the file name, e.g. filename.jpg, filename.txt, filename.xlsx.
Example: The file formats that this project will generate are as follows: .txt, .docx, .xls, etc.
Please describe which particular fields or elements of metadata you intend to capture.
Example 1: The use of the Dublin Core metadata will fit with the types of data to be collected because it includes metadata fields important to the researcher and future users.
Example 2: No industry-standard schema fits with this research. One of our tasks will be to establish standards for the metadata tags.
Describe the software or tools that will be used to capture the metadata. There are several programs available to capture or create metadata, including Excel, Access, and others.
Example 1: Metadata will be captured and created at the time of initial data creation. It will be formatted in an open source code that will allow for ease of transformation if needed.
Example 2: To allow for interoperability in the future with other data systems and applications, the data sources will be described in full and stored using well-defined domain models
Example: Documentation of data will be written in the Dublin Core metadata standard which is standard for digital objects. The following fields will be kept (if applicable): TITLE, CREATOR, SUBJECT, FUNDERS, RIGHTS, DATES, SOURCES, LIST OF FILE NAMES, FILE FORMATS, VERSIONS, CHECKSUMS. Optional fields may be used if necessary. These include: ACCESS INFORMATION, LANGUAGE, LOCATION, METHODOLOGY, DATA PROCESSING FILE STRUCTURE, VARIABLE LIST, CODE LISTS.
3 Why are access and sharing policies important?
Sharing data is an important tool to advancing science and maximizing research investment.
3.1 How can data be made available?
Digital data can be made available in a variety of ways and places. Placing your data in multiple places is preferred. Some places to consider are; the university institutional repository (contact Michele Reilly for upload), a national data center, publication in a widely available scientific journal, book or website, or the institutional archives that are standard for a particular discipline (e.g.IRIS for seismological data, UNAVCO for GPS data. To see a list of data storage options, visit the Data Management Plan Research Guide
Example: The reports generated by this project will be made available through the University of Houston Libraries Institutional Repository (UHLIR) and a website maintained by the PI.
3.2 When do you need to make your data available?
Most NSF directorates mandate that data should be made available as soon as possible. There is some leeway depending on the directorate. For the Division of Earth Sciences: as soon as possible but no later than two (2) years after the data were collected. For continuing observations or for long-term (multi-year) projects, data are to be made public annually. See National Science Foundation for exceptional circumstances extensions.
Example 1: The data produced by the project will be made available immediately following the grant end date.
Example 2: The data generated by this project will be made available after a three (3) year embargo period to ensure privacy of participants.
Example 3: As a Division of Earth Sciences awardee, the embargo period for release of our data will be no longer than two (2) years from completion of the project.
Describe how the data will be made available to others. Please include your reasoning for charging any fees for use of the data.
Example: The data will be accessed via the World Wide Web through a URL and through the UHLIR. There will be no fee for this access.
State whether or not you will wish to retain rights to use the data before opening it up to a wider audience. Please include your reasoning for retaining these rights.
Example: The PI will not need to retain the right to use the data before opening it up to a wider audience.
3.5 Provisions for collected data.
Briefly explain provisions for the data collected if PI or Co-PI were to leave the institution. Please include reasoning. This information is required for the Engineering Directorate.
Example: Should the PI leave the institution, access to the collected data will be transferred to another qualified individual within the department.
Indicate whether your data is covered by copyright and state the owner. If you own the copyright to the dataset, will you be licensing it?
Example: The dataset is not covered by copyright. No action need be taken to license the dataset.
Indicate whether your data needs to be de-personalized and how you plan to resolve ethical and privacy issues.
Example 1: Because some of the data is of a sensitive nature, it is expected that the data will be de-identified before deposit into an open access database.
Example 2: This project has no data that will describe individuals. None of the data will require any sort of confidentiality.
3.8 HIPPA and Data Protection Act 1998
Indicate whether or not your data falls under either of these two acts. For more information, visit the HIPPA FAQ or the UK Legislation page.
If your data includes personal information, how will you be protecting this data? Only answer this question if you answered "yes" to question 3.8 above.
Example: Personal data will be redacted before allowing access.
IRB stands for Institutional Review Board. Each University has their own Division of Research and IRB protocols. UH Researchers can visit the Division of Research site for more information. If you are not from UH, please check with your home institution.
Example: The IRB Protocol requires that personal data not be included in final summary of the project. This information will be scrubbed from the final summary.
Indicate with a yes or no whether restrictions will need to be placed on the data.
Briefly describe the reasons why this data cannot be shared or re-used. Only answer this question if you answered "yes" to question 4.1 above.
Example: Initially the Database will be the intellectual property of the Database editor, who is the PI for this grant. The research team that is developing the Database will develop a joint management plan as part of its work in establishing the Database, probably delegating oversight of the Database to the editorial board of the journal Party Politics.
Please give explanations of who the likely users are of this data.
Example: The information in this Database will be used by students and scholars of representative democracy, and by policy makers and practitioners in democracy-promoting organizations.
The following Directorates require sharing information: Division of Earth Sciences, Directorate of Mathematical & Physical Sciences, and Division of Astronomical Sciences. Please see your directorate for more information.
5 Why is archiving and preservation of access important?
Digital data needs to be actively managed in order to ensure long-term access and preservation. As technologies change it is important to preserve and protect shared scientific heritage against loss from technology evolution.
The NIH guidelines state that:
NIH recognizes that it takes time and money to prepare data for sharing. Thus, applicants can request funds for data sharing and archiving in their grant application. (See also the section on What to Include in an NIH Application.) Investigators who incorporate data sharing in the initial design of the study may more readily and economically establish adequate procedures for protecting the identities of participants and share a useful dataset with appropriate documentation.
The NSF Social Sciences Directorate says about funding: Any costs should be explained in the Budget Justification pages.
Archives, Repositories, and Data Centers
Indicate the name of the storage repository, data center, or archives where you plan to store your data. See the Data Storage Options Research Guide for more information.
Library Example: Data will be deposited into the University of Houston Institutional Repository (IR) hosted by the Texas Digital Library (TDL) which will provide secondary access and long-term storage. The PI will be responsible for maintaining the data, applying proposed metadata, uploading the data with associated metadata per the data management plan. The library will provide storage and monitor the preservation of those materials for the life of the data.
Industry Example: The IRIS DMC was consulted and agrees to act as the long term archive and dissemination facility for data collected as part of this proposal. It is the intention of the PIs to use the IRIS DMC to meet our obligation for data management of the time series data collected in this experiment. The PI will be responsible for maintaining the data, applying proposed metadata uploading the data with associated metadata per the data management plan, and monitor the preservation of those materials for the life of the data.
5.2 How long do you have to retain your data?
Each Directorate has a different retention policy. Please see your directorate for more information.
Example: Per the Division of Earth Sciences: "For those programs in which selected principle investigators have initial periods of exclusive data use, data should be made openly available as soon as possible, but no later than two (2) years after the data were collected."
Describe the preservation plan for your data. This information should be available from your chosen data storage option.
Library Example: Preservation and backup will be the responsibility of TDL and the University of Houston Libraries IR manager using a combination of the TDL Preservation Netword and cloud storage through TDL's Amazon S3. TDL is maintaining duplicates of all applications and databases on TDL hardware housed at the UT Austin data center. Data stored in TDL-hosted services will continue to be preserved in the TDL Preservation network through the TDL partnership with the Texas Advanced Computing Center (TACC).
Each Directorate has a different minimum retention policy. Please see your directorate for more information. Your institution may also require minimum retention periods.
Example: The data will be retained for a minimum of 3 years per institutional and Engineering Directorate requirements.
Permalink: http://info.lib.uh.edu/p/dmp-faqs/definitions