Skip to Main Content
Albert S. Cook Library

Data

Sources for finding data

How to use DMPTool.org

Who can use the tool?

DMPTool is free for anyone to create data management plans. As a user, you can:

  • Create your own plans.
  • Co-author a plan with collaborators.
  • If you are a researcher from one of the participating institutions, you can log in using your institutional credentials. You may then be presented with institution-specific guidance and have the option to get feedback from local data experts.

Sign in DMPTool.org

Anyone affiliated with Towson University can sign in DMPTool.org using the university email address (with "@towson.edu"). Simply go to DMPTool.org and input the university email address.

Click the button of "" to sign in in the following page of TU signing page.

The "My Dashboard" will be the default page after you sign in successfully.

Overview of My dashboard

When you log in, you will be directed to “My dashboard.” From here, you can create, edit, share, download, copy, or remove any of your plans. You will also see plans that have been shared with you by others.

If others at your institution/organization have chosen to share their plans internally, you will see a second table of organizational plans. This allows you to download a PDF and view their plans as samples or discover new research data. Additional samples are available in the list of public plans.

How do I create a data management plan?

Start a plan

Click the “Create plan” button on My dashboard or the top menu to create a plan. This will take you to a wizard that helps you select the appropriate template:

 

  1. Enter a title for your research project. If applying for funding, use the project title as it appears in the proposal.
  2. Select the primary research organization. This field will be pre-populated if you are associated with a participating institution/organization. You have the option to clear the field and select another organization from the list. You will be presented with institution-specific templates and guidance based on your selection. You can also check the box that “No organization is associated with this plan.”
  3. Select the primary funding organization. If you must include a data management plan as part of a grant proposal, select your funder from the list. You may be presented with a secondary dropdown menu if your funder has different requirements for specific programs (e.g., NSF, DOE). See the complete list of funder requirements supported by DMPTool. If your funder is not on the list or you are not applying for a grant, check the box for “No funder associated with this plan;” this selection will provide you with a generic template.

 

If you are just testing the tool or taking a course on data management, check the box “Mock project for test, practice, or educational purposes.” Marking your plans as a test will be reflected in usage statistics and prevent public or organizational sharing; this allows other users to find actual sample plans more easily.

Once you have made your selections, click “Create plan.”

You can also copy an existing plan (from the Actions menu next to the plan on My Dashboard) and update it for a new research project and grant proposal.

What are the different features of the DMPTool?

The tabbed interface allows you to navigate through different functions when editing your plan.

  • Project details: Includes basic administrative details. The right-hand side of the page is where you can select up to 6 organizations to view additional guidance as you write your plan. The more information you provide here, the more valuable your plan will be to you and others in the future (e.g., for data reuse and proper attribution).
  • Collaborators: Here, you list the project’s Principal Investigator(s) and those responsible for data management. DMP Collaborators section is where you can invite specific people to read, edit, or administer your plan. Invitees will receive an email notification that they have access to this plan.
  • Write Plan: There may be more than one tab if your funder or institution asks different sets of questions at different stages, e.g., a grant application and post-award. Guidance and comments are displayed in the right-hand panel beside each question. If you need more guidance or find it is too much, you can make adjustments on the “Project details” tab.
  • Research Outputs: Generate a list of your anticipated research output(s) that are described in your DMP. You can create as many entries as needed.
  • Download: This allows you to download your plan in various formats. You can adjust the formatting (font type, size, and margins) for PDF files, which may be helpful if working to page limits (e.g., NSF and NIH data management plans are limited to 2 pages).
  • Finalize: This tab has two important functions.
    1. You can set your plan visibility. By default, all new and test plans will be set to Private visibility. Public and Organizational visibility are intended for finished plans. You must answer at least 50% of the questions to enable these options.
    2. Register your plan and add it to ORCID. This is where you can generate a DMP ID for your data management plan. To register your plan you must have completed the following:
      • answered at least 50% of questions
      • identified your funder
      • linked your DMPTool account to your ORCID via your Third-party applications page
      • The plan is not a mock project for testing, practice, or educational purposes.

 

Share plans

You can share your plan with colleagues in the Collaborators tab. Input the email address(es) of any collaborators you would like to invite to read or edit your plan. Set their permissions via the radio buttons and click to "Add collaborator." Adjust permissions or remove collaborators at any time via the drop-down options in the table.

The "Finalize" tab is where you can set your plan visibility.

  • Private: restricted to you and your collaborators.
  • Organizational: anyone at your organization can view your plan.
  • Public: anyone can view your plan in the public plans list.

By default, all new and test plans will be set to Private visibility. Public and Organizational visibility are intended for finished plans. You must answer at least 50% of the questions to enable these options.

How do I get help from someone at my institution?

After logging in, you will find an email address and URL for help at the top of the page.

There may also be an option to request feedback on your plan. This is available when research support staff at your institution have enabled the service. Click to “Request feedback” and your local administrators will be alerted to your request. Their comments will be visible in the “Comments” field adjacent to each question. You will receive an email notification when an administrator provides feedback.

Introduction

What is a data management plan?

A data management plan is a formal document that outlines what you will do with your data during and after a research project. Most researchers collect data with some form of plan in mind, but it's often inadequately documented and incompletely thought out. Many data management issues can be handled easily or avoided entirely by planning ahead. With the right process and framework it doesn't take too long and can pay off enormously in the long run.

Who requires a plan?

In February of 2013, the White House Office of Science and Technology Policy (OSTP) issued a memorandum directing Federal agencies that provide significant research funding to develop a plan to expand public access to research. Among other requirements, the plans must:
"Ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans, as appropriate, describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified"

Most US Federally funded grants and many private foundations require some form of a data management plan. In the United States, most data management plans are 2-page documents submitted as part of the funding proposal process.

We can help

We have been working with internal and external partners to make data management plan development less complicated. By getting to know your research and data, we can match your specific needs with data management best practices in your field to develop a data management plan that works for you. If you do this work at the beginning of your research process, you will have a far easier time following through and complying with funding agency and publisher requirements.

We recommend that those applying for funding from US Federal agencies, such as the NSF or NIH, use the DMPTool. The DMPTool provides guidance for many of the Federal agencies’ requirements, along with links to additional resources, services, and help.


Types of Data

Research projects generate and collect countless varieties of data. To formulate a data management plan, it's useful to categorize your data in four ways: by source, format, stability, and volume.

What's the source of the data?

Although data comes from many different sources, they can be grouped into four main categories. The category(ies) your data comes from will affect the choices that you make throughout your data management plan.

Observational

  • Captured in real-time, typically outside the lab
  • Usually irreplaceable and therefore the most important to safeguard
  • Examples: Sensor readings, telemetry, survey results, images

 

Experimental

  • Typically generated in the lab or under controlled conditions
  • Often reproducible, but can be expensive or time-consuming
  • Examples: gene sequences, chromatograms, magnetic field readings

 

Simulation

  • Machine generated from test models
  • Likely to be reproducible if the model and inputs are preserved
  • Examples: climate models, economic models

 

Derived / Compiled

  • Generated from existing datasets
  • Reproducible, but can be very expensive and time-consuming
  • Examples: text and data mining, compiled database, 3D models

 

What's the form of the data?

Data can come in many forms, including:

  • Text: field or laboratory notes, survey responses
  • Numeric: tables, counts, measurements
  • Audiovisual: images, sound recordings, video
  • Models, computer code
  • Discipline-specific: FITS in astronomy, CIF in chemistry
  • Instrument-specific: equipment outputs

 

How stable is the data?

Data can also be fixed or changing over the course of the project (and perhaps beyond the project's end). Do the data ever change? Do they grow? Is previously recorded data subject to correction? Will you need to keep track of data versions? With respect to time, the common categories of dataset are:

  • Fixed datasets: never change after being collected or generated
  • Growing datasets: new data may be added, but the old data is never changed or deleted
  • Revisable datasets: new data may be added, and old data may be changed or deleted

 

The answer to this question affects how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge so it is imperative that you begin with a plan to carry you through the entire data management process.

How much data will the project produce?

For instance, image data typically requires a lot of storage space, so you'll want to decide whether to retain all your images (and, if not, how you will decide which to discard) and where such large data can be housed. Be sure to know your archiving organization's capacity for storage and backups.

To avoid being under-prepared, estimate the growth rate of your data. Some questions to consider are:

  • Are you manually collecting and recording data?
  • Are you using observational instruments and computers to collect data?
  • Is your data collection highly iterative?
  • How much data will you accumluate every month or every 90 days?
  • How much data do you anticipate collecting and generating by the end of your project?

File Formats

The file format you choose for your data is a primary factor in someone else's ability to access it in the future. Think carefully about what file format will be best to manage, share, and preserve your data. Technology continually changes and all contemporary hardware and software should be expected to become obsolete. Consider how your data will be read if the software used to produce it becomes unavailable. Although any file format you choose today may become unreadable in the future, some formats are more likely to be readable than others.

Formats likely to be accessible in the future are:

  • Non-proprietary
  • Open, with documented standards
  • In common usage by the research community
  • Using standard character encodings (i.e., ASCII, UTF-8)
  • Uncompressed (space permitting)

Examples of preferred format choices:

  • Image: JPEG, JPG-2000, PNG, TIFF
  • Text: plain text (TXT), HTML, XML, PDF/A
  • Audio: AIFF, WAVE
  • Containers: TAR, GZIP, ZIP
  • Databases: prefer XML or CSV to native binary formats

If you find it necessary or convenient to work with data in a proprietary/discouraged file format, do so, but consider saving your work in a more archival format when you are finished.

For more information on recommended formats, see the UK Data Service guidance on recommended formats.

Tabular data

Tabular data warrants special mention because it is so common across disciplines, mostly as Excel spreadsheets. If you do your analysis in Excel, you should use the "Save As..." command to export your work to .csv format when you are done. Your spreadsheets will be easier to understand and to export if you follow best practices when you set them up, such as:

  • Don't put more than one table on a worksheet
  • Include a header row with understandable title for each column
  • Create charts on new sheets- don't embed them in the worksheet with the data

 

Other risks to accessibility

  • Encrypted data may be effectively lost if it was encrypted with a key that has been lost (e.g., a forgotten password). For this reason, encrypted data representations are strongly discouraged.
  • Data that is legally encumbered may also be considered lost. So may data bound by ambiguous or unknown access and archiving rights, because the cost of clarifying the rights situation is often prohibitive. See data rights and licensing for guidance.

Organizing Files

Basic Directory and File Naming Conventions

These are rough guidelines to follow to help manage your data files in case you don't already have your own internal conventions. When organizing files, the top-level directory/folder should include:

  • Project title
  • Unique identifier (Guidance on persistent external identifiers is available below)
  • Date (yyyy or yyyy.mm.dd)

 

The sub-directory structure should have clear, documented naming conventions. Separate files or directories could apply, for example, to each run of an experiment, each version of a dataset, and/or each person in the group.

  • Reserve the 3-letter file extension for the file format, such as .txt, .pdf, or .csv.
  • Identify the activity or project in the file name.
  • Identify separate versions of files and datasets using file or directory naming conventions. It can quickly become difficult to identify the "correct" version of a file.
  • Record all changes to a file no matter how small. Discard obsolete versions after making backups.

File Renaming

Tools to help you:


Metadata: Data Documentation

Why document data?

Clear and detailed documentation is essential for data to be understood, interpreted, and used. Data documentation describes the content, formats, and internal relationships of your data in detail and will enable other researchers to find, use, and properly cite your data.

Begin to document your data at the very beginning of your research project and continue throughout the project. Doing so will make the process much easier. If you have to construct the documentation at the end of the project, the process will be painful and important details will have been lost or forgotten. Don't wait to document your data!

What to document?

Research Project Documentation

  • Rationale and context for data collection
  • Data collection methods
  • Structure and organization of data files
  • Data sources used (see citing data)
  • Data validation and quality assurance
  • Transformations of data from the sanitized data through analysis
  • Information on confidentiality, access and use conditions

 

Dataset documentation

  • Variable names and descriptions
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data (may include computer code)
  • File format and software (including version) used

 

How will you document your data?

Data documentation is commonly called metadata – "data about data". Researchers can document their data according to various metadata standards. Some metadata standards are designed for the purpose of documenting the contents of files, others for documenting the technical characteristics of files, and yet others for expressing relationships between files within a set of data. If you want to be able to share or publish your data, the DataCite metadata standard is of particular signficiance.

Below are some general aspects of your data that you should document, regardless of your discipline. At a minimum, store this documentation in a "readme.txt" file, or the equivalent, with the data itself.

General Overview

  • Title: Name of the dataset or research project that produced it
  • Creator: Names and addresses of the organizations or people who created the data; preferred format for personal names is surname first (e.g., Smith, Jane)
  • Identifier: Unique number used to identify the data, even if it is just an internal project reference number
  • Date: Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, such as maintenance cycle, update schedule; preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range
  • Method: How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook
  • Processing: How the data have been altered or processed (e.g., normalized)
  • Source: Citations to data derived from other sources, including details of where the source data is held and how it was accessed
  • Funder: Organizations or agencies who funded the research

Content Description

  • Subject: Keywords or phrases describing the subject or content of the data
  • Place: All applicable physical locations
  • Language: All languages used in the dataset
  • Variable list: All variables in the data files, where applicable
  • Code list: Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. "999 indicates a missing value in the data")

Technical Description

  • File inventory: All files associated with the project, including extensions (e.g. "NWPalaceTR.WRL", "stone.mov")
  • File formats: Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.
  • File structure: Organization of the data file(s) and layout of the variables, where applicable
  • Version: Unique date/time stamp and identifier for each version
  • Checksum: A digest value computed for each file that can be used to detect changes; if a recomputed digest differs from the stored digest, the file must have changed
  • Necessary software: Names of any special-purpose software packages required to create, view, analyze, or otherwise use the data

Access

  • Rights: Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
  • Access information: Where and how your data can be accessed by other researchers

Persistent Identifiers

If you want to be able to share or cite your dataset, you'll want to assign a public persistent unique identifier to it. There are a variety of public identifier schemes, but common properties of good schemes are that they are:

  • Actionable (you can "click" on them in a web browser)
  • Globally unique across the internet
  • Persistent for at least the life of your data

 

Here are some identifier schemes:

  • ARK (Archival Resource Key) – a URL with extra features allowing you to ask for descriptive and archival metadata and to recognize certain kinds of relationships between identifiers. ARKs are used by memory organizations such as libraries, archives, and museums. They are resolved at "http://www.nt2.net". Resolution depends on HTTP redirection and can be managed through an API or a user interface.
  • DOI (Digital Object Identifier) – an identifier that becomes actionable when embedded in a URL. DOIs are very popular in academic journal publishing. They are resolved at "http://dx.doi.org". Resolution depends on HTTP redirection and the Handle identifier protocol, and can be managed through an API or a user interface.
  • Handle – an identifier that becomes actionable when embedded in a URL. Handles are resolved at "http://www.handle.net/". Resolution depends on HTTP redirection and the Handle protocol, and can be managed through an API or a user interface.
  • InChI (IUPAC International Chemical Identifier) – a non-actionable identifier for chemical substances that can be used in printed and electronic data sources, thus enabling easier linking of diverse data compilations.
  • LSID (Life Sciences Identifier) – a kind of URN that identifies a biologically significant resources, including species names, concepts, occurrences, and genes or proteins, or data objects that encode information about them. Like other URNs, it becomes actionable when embedded in a URL.
  • NCBI (National Center for Biotechnology Information) ACCESSION
  • a non-actionable number in use by NCBI.
  • PURL (Persistent Uniform Resource Locator) – a URL that is always redirected through a hostname (often purl.org). Resolution depends on HTTP redirection and can be managed through an API or a user interface.
  • URL (Uniform Resource Locator) – the typical "address" of web content. It is a kind of URI (Uniform Resource Identifier) that begins with "http://" and consists of a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using the HTTP protocol. Well-managed URL redirection can make URLs as persistent as any identifier. Resolution depends on HTTP redirection and can be managed through an API or a user interface.
  • URN (Uniform Resource Name) – an identifier that becomes actionable when embedded in a URL. Resolution depends on HTTP redirection and the DDDS protocol, and can be managed through an API or a user interface. A browser plug-in can save you from typing a hostname in front of it.

Security and Storage

Data Security

Data security is the protection of data from unauthorized access, use, change, disclosure, and destruction. Make sure your data is safe in regards to:

  • Network security
    • Keep confidential data off the Internet
    • In extreme cases, put sensitive materials on computers not connected to the internet
  • Physical security
    • Restrict access to buildings and rooms where computers or media are kept
    • Only let trusted individuals troubleshoot computer problems
  • Computer systems and files
    • Keep virus protection up to date
    • Don't send confidential data via e-mail or FTP (or, if you must, use encryption)
    • Set passwords on files and computers
    • React with skepticism to phone calls and emails that claim to be from your institution's IT department

 

Encryption and Compression

Unencrypted data will be more easily read by you and others in the future, but you may need to encrypt sensitive data.

  • Use mainstream encryption tools (e.g., PGP)
  • Don't rely on third-party encryption alone
  • Keep passwords and keys on paper (2 copies)

 

Uncompressed data will be also be easier to read in the future, but you may need to compress files to conserve disk space.

  • Use a mainstream compression tool (e.g., ZIP, GZIP, TAR)
  • Limit compression to the 3rd backup copy

 

Backups and storage

Making regular backups is an integral part of data management. You can backup data to your personal computer, external hard drives, or departmental or university servers. Software that makes backups for you automatically can simplify this process considerably. The UK Data Archive provides additional guidelines on data storage, backup, and security.

Backup Your Data

  • Good practice is to have three copies in at least two locations (e.g. original + external/local backup + external/remote backup)
  • Geographically distribute your local and remote copies to reduce risk of calamity at the same location (power outage, flood, fire, etc.)

Test your backup system

  • To be sure that your backup system is working, periodically retrieve your data files and confirm that you can read them. You should do this when you initially set up the system and on a regular schedule thereafter.

Other data preservation considerations

Who is responsible for managing and controlling the data?

  • Who controls the data (e.g., the PI, a student, your lab, your university, your funder)? Before you spend a lot of time figuring out how to store the data, to share it, to name it, etc. you should make sure you have the authority to do so.

For what or whom are the data intended?

  • Who is your intended audience for the data? How do you expect they will use the data? The answer to these questions will help inform structuring and distributing the data.

How long should the data be retained?

  • Is there any requirement that the data be retained? If so, for how long? 3-5 years, 10-20 years, permanently? Not all data need to be retained, and some data required to be retained need not be retained indefinitely. Have a good understanding of your obligation for the data's retention.

Beyond any externally imposed requirments, think about the long-term usefulness of the data. If the data is from an experiment that you anticipate will be repeatable more quickly, inexpensively, and accurately as technology progresses, you may want to store it for a relatively brief period. If the data consists of observations made outside the laborartory that can never be repeated, you may wish to store it indefinitely.

Sharing and Archiving

Why share your data?

 

  • Required by publishers (e.g., Cell, Nature, Science)
  • Required by government funding agencies (e.g., NIH, NSF)
  • Allows data to be used to answer new questions
  • Makes research more open
  • Makes your papers more useful and citable by other researchers

 

Considerations when preparing to share data

  • File Formats for Long Term Access: The file format in which you keep your data is a primary factor in one's ability to use your data in the future. Plan for both hardware and software obsolescence. See file formats and organization for details on long-term storage formats.
  • Don't Forget the Documentation: Document your research and data so others can interpret the data. Begin to document your data at the very beginning of your research project and continue throughout the project. See data documentation and metadata for details.
  • Ownership and Privacy: Make sure that you have considered the implications of sharing data in terms of copyright, IP ownership, and subject confidentiality. See copyright and confidentiality for details.

What does it mean to share data? (adapted from the Support Your Data project)

Sharing data means making your data available so that they can be accessed and used—by yourself or by others—in the future. Here are three factors to consider when sharing data.

Format

Data should be shared in a usable format. This may mean sharing raw data instead of prepared data (or vice versa) or ensuring that data is saved in common or open file formats.

Completeness

Remember that notes, documentation, and other information about your data are part of your data. To ensure that your shared data is useful, make sure these elements are included.

Location

When choosing a method for sharing your data, consider how other researchers will find and use it. The storage options you use to save your data as you work on it will probably be different than the options you use to share it, especially over the longer term.

Requirements and how to meet them

Many research funders, publishers, institutions, and research communities have formal expectations about how data should be shared.

Things to think about

 

  • Though it is very likely that you’ll share your data only at the conclusion of a research project, data sharing should be incorporated into your data management practices from the beginning.
  • Data sharing is about showing your work. Though many current data sharing requirements focus on the data underlying journal articles and other scholarly works, you should be prepared to share all of your data. All of it has potential value.
  • There are limits on how data containing sensitive or personally-identifying information can be shared, but you should be prepared to share enough information about your work so that others can evaluate, potentially replicate, and otherwise make use of what you’ve done.

 

Finding a data repository

You should select a repository or archive for your data based on the long-term security offered and the ease of discovery and access by colleagues in your field. There are two common types of a repository to look for:

  • <strong>Discipline-specific:</strong> accepts data in a particular field or of a particular type (e.g., GenBank accepts nucleotide sequence data)
  • Generalist: accepts data of any type produced within the institution that maintains it (e.g., Dryad)

 

A searchable and browsable list of repositories can be found at these websites:

Citing Data

Citing data is important in order to:

  • Give the data producer appropriate credit
  • Allow easier access to the data for repurposing or reuse
  • Enable readers to verify your results

 

Citation Elements

A dataset should be cited formally in an article's reference list, not just informally in the text. Many data repositories and publishers provide explicit instructions for citing their contents. If no citation information is provided, you can still construct a citation following generally agreed-upon guidelines from sources such as the Force 11 Joint Declaration of Data Citation Principles and the current DataCite Metadata Schema.

Core elements

  • There are 5 core elements usually included in a dataset citation, with additional elements added as appropriate.
    • Creator(s) – may be individuals or organizations
    • Title
    • Publication year when the dataset was released (may be different from the Access date)
    • Publisher – the data center, archive, or repository
    • Identifier – a unique public identifier (e.g., an ARK or DOI)
  • Creator names in non-Roman scripts should be transliterated using the ALA-LC Romanization Tables.

Common additional elements

  • Although the core elements are sufficient in the simplest case – citation to the entirety of a static dataset – additional elements may be needed if you wish to cite a dynamic dataset or a subset of a larger dataset.
    • Version of the dataset analyzed in the citing paper
    • Access date when the data was accessed for analysis in the citing paper
    • Subset of the dataset analyzed (e.g., a range of dates or record numbers, a list of variables)
    • Verifier that the dataset or subset accessed by a reader is identical to the one analyzed by the author (e.g., a Checksum)
    • Location of the dataset on the internet, needed if the identifier is not "actionable" (convertable to a web address)

Example citations

  • Kumar, Sujai (2012): 20 Nematode Proteomes. figshare. https://doi.org/10.6084/m9.figshare.96035.v2 (Accessed 2016-09-06).
  • Morran LT, Parrish II RC, Gelarden IA, Lively CM (2012) Data from: Temporal dynamics of outcrossing and host mortality rates in host-pathogen experimental coevolution. Dryad Digital Repository. https://doi.org/10.5061/dryad.c3gh6
  • Donna Strahan. "08-B-1 from Jordan/Petra Great Temple/Upper Temenos/Trench 94/Locus 41". (2009) In Petra Great Temple Excavations. Martha Sharp Joukowsky (Ed.) Releases: 2009-10-26. Open Context. https://opencontext.org/subjects/30C3F340-5D14-497A-B9D0-7A0DA2C019F1 ARK (Archive): http://n2t.net/ark:/28722/k2125xk7p
  • OECD (2008), Social Expenditures aggregates, OECD Social Expenditure Statistics (database). https://doi.org/10.1787/000530172303 (Accessed on 2008-12-02).
  • Denhard, Michael (2009): dphase_mpeps: MicroPEPS LAF-Ensemble run by DWD for the MAP D-PHASE project. World Data Center for Climate. https://doi.org/10.1594/WDCC/dphase_mpeps
  • Manoug, J L (1882): Useful data on the rise of the Nile. Alexandria : Printing-Office V Penasson. http://n2t.net/ark:/13960/t44q88124

Sharing data that you produced/collected yourself

  • Much data is not copyrightable in the United States because facts are not copyrightable. However, a presentation of data (such as a chart or table) may be.
  • Data can be licensed. Some data providers apply licenses that limit how the data can be used to protect the privacy of study participants or to guide downstream uses of the data (e.g., requiring attribution or forbidding for-profit use).
  • If you want to promote sharing and unlimited use of your data, you can make your data available under a Creative Commons CC0 Declaration to make your wishes explicit.

Sharing data that you have collected from other sources

  • You may or may not have the right to do so, depending upon whether that data were accessed under a license with terms of use.
  • Most databases to which the UC Libraries subscribe are licensed and prohibit redistribution of data outside of UC.

If you are uncertain as to your rights to disseminate data, UC researchers can consult with your campus Office of General Council. Note: Laws about data vary outside the U.S.

For a general discussion about publishing your data, applicable to many disciplines, see the ICPSR Guide to Social Science Data Preparation and Archiving.

Confidentiality and Ethical Concerns

It is vital to maintain the confidentiality of research subjects both as an ethical matter and to ensure continuing participation in research. Researchers need to understand and manage tensions between confidentiality requirements and the potential benefits of archiving and publishing the data.

  • Evaluate the anonymity of your data. Consider to what extent your data contains direct or indirect identifiers that could be combined with other public information to identify research participants.
  • Obtain a confidentiality review. A benefit of depositing your data with ICPSR is that their staff offers a Disclosure review service to check your data for confidential information.
  • Comply with UC regulations. Researchers concerned about confidentiality issues with their data should consult the UC policy for Protection of Human Subjects in Research.
  • Comply with regulations for health research set forth in the Health Insurance Portability and Accountability Act (HIPPA).

To ethically share confidential data, you may be able to:

  • Gain informed consent for data sharing (e.g. deposit in a repository or archive)
  • Anonymize the data by removing identifying information. Be aware, however, that any dataset that contains enough information to be useful will always present some risk.
  • Restrict the use of your data. ICPSR provides a sample Restricted Data Use Contract and Restricted-Use Data Management Guidance.