Application

Open Data | Frequently Asked QuestionsEnclave Data Application

Last Updated: June 28, 2016 at 5:58 pm EST

Table of Contents

  1. Eligibility
  2. Description of Data Levels
  3. Access to Data
  4. Submission Process and Deadlines
  5. Use / Sharing

1. Eligibility

Principal Investigators

Only faculty members or post-doctoral researchers affiliated with a university or research institution may apply for access to Enclave Data and serve as the Principal Investigator. Post-doctoral researchers who are working in a lab must have a letter of support from the lab director. Research may only be for non-commercial, no derivative use. Furthermore, access to Enclave Data is limited to researchers whose projects have been vetted by a university or independent IRB.

All applications will be scrutinized for four criteria: privacy, security, ethical considerations, and value to people in crisis.

Students

Graduate students may gain access to the enclave data under the supervision of a Principal Investigator. The Principal Investigator and institution will ensure that the student meets all conditions of the agreement. Approved graduate students must sign the Supplemental Agreement with Research Staff.

Multiple Institutions

Multiple institutions working together on a research project should submit one application. This application should identify a 1-2 Principal Investigator(s) who will serve as primary contact(s) for the entire research team. Within the application, a separate Enclave Data Use Agreement must be completed for each institution whose employees or students are involved in the research.

Who is Not Eligible

Individuals not associated with a university or research institution may not have access to the Enclave Data or to output derived from these data and may not be a part of a research team submitting an application for the data. In addition, individuals associated with the following organizations and groups may not have access even if associated with a university or research institution:

  • Law enforcement
  • Undergraduate students
  • Self-employed individuals

These policies will be reviewed periodically.

2. Data Levels

Data is shared on a need-to-know basis. Some projects require data that have a higher likelihood of containing personally identifiable information (PII). To protect our texters’ identities, such projects will undergo increased scrutiny prior to approval.

Crisis Text Line offers three levels of Enclave Data to researchers corresponding to the data’s risk of containing a texter’s PII, either through direct or indirect identification. Each level includes data from previous levels.

Successful applications will both apply for a level, and describe the specific variables needed for their research project.

A. Conversation Level

This dataset will allow researchers to explore attributes of a conversation, defined as a text exchange occurring between one texter and one or more Crisis Counselors. (On average, a conversation contains ~40 messages exchanged; i.e., 20 in each direction.)

This dataset includes attributes of conversations (n > 200,000). This Level does not include attributes of individual messages or the actors (texters, Crisis Counselors) involved in the conversation. For Conversation Level, researchers may request data as far back as August 2013, when the Crisis Text Line service began.

Questions you might like to ask of Conversation Level data:

Example variables.

 

B. Actor Level

Overview

This dataset will allow researchers to track actors (texters, Crisis Counselors) across multiple conversations. Many texters use Crisis Text Line more than once; the average texter uses the service 2.4 times; 5% of texters use the service 7 or more times.

A texter is defined as a unique phone number sending a text message to the Crisis Text Line service (the phone number will not be included in the data). Currently, Crisis Text Line is only accessible from U.S. phone numbers. This dataset includes all of the data from Level A, as well as (1) an anonymous ID associated with each texter (n > 100,000), (2) an anonymous ID associated with each Crisis Counselor (n > 1500), and (3) some message meta-data (n > 13,000,000).

For Actor Level, researchers may request data as far back as August 2013, when the Crisis Text Line service began.

Questions you might like to ask of these data:

Example variables (in addition Conversation Level.)

 

C. Message Level

Overview

This dataset will allow researchers to see details of message content (n > 13,000,000), scrubbed for Personally Identifiable Information. This will include both texter and crisis counselor message content.

Message Level data will be retained by Crisis Text Line for 7 years, including message content and texter phone number. You may request data as far back as August 2013, when the service began.

Questions you might like to ask of these data:

Example variables (in addition Actor Level.)

 

3. Access to Data

How You Will Access Data

We are not currently allowing researchers to download their own copy of the data; we are only making the data available by secure VPN access. Currently, we are not able to offer data via secure transfer (SFTP). If your application is approved, you are approved for access to all levels of data above the level you were approved for (i.e., Level C access includes Levels A and B).

Skills Required to Access Data

To access the data, it’s important to have someone on your research team with proficiency in the following tools: (1) git, (2) Bash (3) Command Line Utilities for your preferred language (C, python, java, etc).

Where You’ll Be Able to Access from

Researchers will be authorized to access the CTL Enclave Data from specific computer(s) (in tech terms, an IP white list) from a specific point of access. Computers and locations should be (1) private and (2) secure. Only those with explicit approved access to the dataset should be able to see your screen while you are working with the data. (I.e., working in a public lab, an open office, or a Starbucks is not an option.) Generally, computers should be in locked or restricted-access rooms, and computer screens should be visible only to those working on the project, and not passerby. Screen protectors are recommended.

What You Will Have Access To

We are setting up a virtual machine (VM) for every research team (one instance per team). I.e., The copy of the database you will receive access to will be exclusively for your team for the duration of your research project. VMs will be hosted on-site at Crisis Text Line, not through Amazon Web Services (AWS) or any other cloud-based solution. You’ll be able to save and build on your database instance.

Who Will Have Access

Research projects with multiple universities or research institutes involved may submit one application, but each university/insitute must submit a signed Data Use Agreement, completed after the project is approved.

How Long You’ll Have Access

Access for a research project will be for as long as the project requires; the default expectation is one year, with the option to request longer based on your needs. You’ll also be able to add your custom algorithms, which no one else will have access to. This will be done through an approval process.

How Much Access Will Cost

The cost of the VMs will be covered by Crisis Text Line for research projects completed before August 2017. Afterwards, researchers may be asked to cover the cost of their VM.

4. Submission Process and Deadlines

Deadlines

Application materials are due by 11:59pm EST on the Application Due Date.

Application Due Date

Notifications of Decision

April 1, 2016

June 1

July 8, 2016

Sept 1

Oct 1, 2016

Dec 1

Jan 1, 2017

Mar 1

Submission Process

Criteria For Approval

All applications will be scrutinized for four criteria: privacy, security, ethical considerations, and value to people in crisis.

Given Crisis Text Line’s limited resources, not all applications that pass security, privacy, and ethical requirements can be approved. Successful applications must indicate strong potential value to helping people in crisis. Research questions must focus on understanding texters and crisis, not on understanding Crisis Counselors. Four or five applications will be approved per round for the first two application rounds.

5. Use / Sharing

No Commercial Use

Data, derivatives, research, algorithms, and all other outputs are strictly for non-commercial use. No commercial uses will be approved.

Sharing Research

All derivatives of the data and output must be shared on a repository on our site within 3 months after completing your research project with Crisis Text Line.

Media

Crisis Text Line must be able to approve all research papers prior to submission AND publication, assessing ONLY for risk of PII. Crisis Text Line must also approve media that references the research or Crisis Text Line. Crisis Text Line will not require media approval once the paper is published. Crisis Text Line must be mentioned in all relevant research, cited as “Crisis Text Line, Inc.”

Crisis Text Line may request the organization not be mentioned by name in the article.

Screenshots and Notes

No screenshots. No notes. No napkin scratchpads. The data must stay on the screen. You play a critical role in protecting our texters’ data. I know you know, but we’ve heard about things: table shells with numbers, screenshots attached to emails, scratchpads left on a desk that end up in the recycling bin.

If data got out, here’s what could happen: if a texter is identified, a tweet or Facebook post by a malicious user, from an anonymous account, could tag that texter and attach a description of their personal health challenges for all of their friends to see; or what about a quote from that texter’s moment of greatest need, for all the world to see? What’s said in a moment of crisis to a confidant should stay private.

A breach may terminate access to the data for your research and may result in the termination of access for all researchers at your university or institution.

Output

All output from your VM must be requested; it will be reviewed by staff at Crisis Text Line. We will only vet output for risk of PII exposure. Requested output should be something close to what you would publish in your paper. It may not contain any direct or indirect PII. Our reviewer(s) will treat any output requests as if the content would be published in the NY Times the next day; i.e., would we, our texters, and mental health advocates feel comfortable that we were releasing this information to the public?

SLA (Service Level Agreement)

If you request a new package to analyze the data, your request will be reviewed and the tool will be installed within 24 hours, only counting business days (i.e., an approved package requested at 3pm EST on Friday will be installed by 3pm EST on the following Monday. The same SLA holds true for federal holidays.

The cost of any tools requested will need to be covered by the researcher.

For details about how we use and protect data at CTL, see our Privacy Policy.