Slots: 2
Deadlines
Internal Deadline: Contact ORIF.
LOI: May 5, 2022
External Deadline: June 13, 2022, 5pm PT
Award Information
Award Type: Grant / Cooperative Agreement
Estimated Number of Awards: Depends on number of meritorious applications
Anticipated Award Amount: $300,000 per year
Who May Serve as PI: Individuals with the skills, knowledge, and resources necessary to carry out the proposed research as a Principal Investigator (PI) are invited to work with their organizations to develop an application. Individuals from underrepresented groups as well as individuals with disabilities are always encouraged to apply.
Link to Award: https://science.osti.gov/grants/FOAs/-/media/grants/pdf/foas/2022/SC_FOA_0002725.pdf
Process for Limited Submissions
PIs must submit their application as a Limited Submission through the Office of Research Application Portal: https://orif.usc.edu/oor-portal/.
Materials to submit include:
- (1) Single Page Proposal Summary (0.5” margins; single-spaced; font type: Arial, Helvetica, or Georgia typeface; font size: 11 pt). Page limit includes references and illustrations. Pages that exceed the 1-page limit will be excluded from review.
- (2) CV – (5 pages maximum)
Note: The portal requires information about the PIs and Co-PIs in addition to department and contact information, including the 10-digit USC ID#, Gender, and Ethnicity. Please have this material prepared before beginning this application.
Purpose
Modern scientific computing relies on processing a deluge of data coming from both experiments and simulations, with even relatively modest scientific activities generating petabytes of data. Planned upgrades of experimental facilities in the foreseeable future, combined with the increased computing capabilities of DOE’s exascale supercomputers and other state-of-the-art computing capabilities coming online over the next few years, promise to compound the many challenges in storing and managing data such that it can be effectively used to fuel scientific discovery [2-12].
Traditional large-scale scientific data management has relied on the use of file formats optimized for simple access patterns on parallel, distributed file systems. These files have tended to be metadata poor and complicated to access, lacking flexible indexing for efficient searching, where enabling new kinds of analysis often requires writing new, low-level code [2-5]. Scientific workflows have also become increasingly complicated, integrating both simulation and the analysis of data from experiments, exploiting advanced machine-learning techniques [4,8-10], and requiring distributed, multi-stage processing [5-7]. Additionally, significant opportunities exist to enhance trust and aid scientific reproducibility by enhancing our ability to record data provenance and verify data integrity. Fortunately, through a combination of past scientific-datamanagement investments and leveraging the growing ecosystem of big-data and database technologies, scientific endeavors have made significant improvements in their data management and use. While the ever-increasing scale of scientific data threatens that progress, new “smart” storage and networking technologies that provide embedded computational capabilities; novel methods for indexing, representing, and distributing data; and advanced techniques for interfacing with data management systems and integrating into programming environments promise significant breakthroughs. Moreover, new techniques for scientific data management can help integrate data into large scientific-data and computational ecosystems that embody the FAIR principles of Findability, Accessibility, Interoperability, and Reuse [Error! Reference
source not found.], thereby enabling collaborative, responsive science at yet-unprecedented scales [2-5].
Priority Research Directions
As highlighted by the recent ASCR Workshop on the Management and Storage of Scientific Data
[2,3], building on the outcomes of prior community activities, including Storage Systems and
I/O: Organizing, Storing, and Accessing Data for Scientific Discovery [5] and the Office of
Science Roundtable on Data for AI [4], and aligned with needs highlighted by interagency
planning [11,12], important priority research directions are:
- “High-productivity interfaces for accessing scientific data efficiently” – Innovative
interfaces to data-management capabilities allowing for flexible, high-performance
access to large data sets, potentially federated across different kinds of memory, edge
devices, and repositories, capturing relevant usage statistics, provenance, and other
metadata. - “Understanding the behavior of complex data management systems in DOE science” –
Understanding how the behavior of users, application and system algorithms, and
hardware can be combined and exploited to improve performance and resilience of
scientific-data-management systems, recognizing that the relevant behaviors can change
over time. - “Rich metadata and provenance collection, management, search, and access” –
Innovative methods for collecting and managing provenance and other metadata to
support FAIR principles, resilience, and scientific reproducibility and discovery. - “Reinventing data services for new applications, devices, and architectures” – Innovative
methods to design scientific-data-management services for state-of-the-art storage and
networking devices, including those providing computational capabilities.
Each pre-application and application must address, as its primary focus, one or more of these
priority research directions. As specified in Section IV B and Section IV D, each pre-application
and application must explicitly list the priority research direction(s) primarily motivating the
proposed work.
Note that this FOA places requirements on the Data-Management Plan (DMP) appendix in
Section IV D that supplement the standard requirements found in Section VIII.
Visit our Institutionally Limited Submission webpage for more updates and other announcements.