What can I do with the GEMS Platform?
The GEMS Platform can be used to securely share data and analytic workflows with whomever you choose. Your data, your choice! You can use GEMS as a discovery platform, where datasets may be revealed via metadata alone, giving discoverers the option to request access to the data or workflow from the provider. Once you have one or more data sets in the platform, you can link columns within each data set via ontological terms that they have in common. And you can perform sophisticated analytics within the Jupyter environment in GEMS, provided in containers spun up within a cloud environment at the Minnesota Supercomputing Institute.
What analytic environments are available?
There are three primary analytic environments bundled into GEMS accessible via the ANALYZE link on the main page. The first, JupyterHub, has become a staple for data scientists. GEMS has a full-featured version of Jupyter, and allows users to customize their R and Python programming experience by installing whatever software libraries they require. Even better, it is pre-loaded with the latest spatial libraries including GDAL and rgdal, which tend to be difficult to build. Second, GEMS provides RStudio, a staple environment for R programmers. Company usage of RStudio requires purchasing a separate license. Finally, GEMS includes a full desktop environment for linux aficionados that allows users to directly run specialized visualization software such as pedigree viewers or 3D molecular visualization tools.
Which languages are supported?
Python, R, Bash shell, Spark, Scala.
How can I link the platform to other GEMS services?
Within Jupyter, users can open a notebook in Python or R and follow the tutorial to learn how to embed calls to APIs from GEMS Exchange. Similarly, API calls from GEMS Sensing can also be embedded in your notebook. Finally, the GEMS Platform is the anchor point for instruction for many of the courses within GEMS Learning.
Is my data secure on GEMS?
Yes, it is. All GEMS staff members, including those at the data center at MSI who maintain GEMS have been fully trained on HIPAA privacy protocols. GEMS also has policies reviewed by MSI’s Research Security and Compliance Analyst. All GEMS staff with access to private data sign an internal non-disclosure agreement designed by UMN’s Office of the General Counsel. All operations within the platform are isolated into encrypted containers where no other users have access (hence no snooping). Data entering or leaving these containers is also encrypted to prevent opportunity for leaks. All identifiable data residing in the platform are legally deemed private and non-public by the State of Minnesota, and hence not FOIAble. Finally, we provide users with various anonymization tools to de-identify their own data sets prior to sharing. These include removal of named identifiers, fuzzing and jittering of geo-coordinates and geospatial aggregation.
How are FAIR(ER) data standards supported?
GEMS data principles
(F)indable, (A)ccesible, (I)nteroperable, (R)eusable and, importantly, (E)thical and (R)producible
- (F) How do I manage metadata?
GEMS data is Findable through our ElasticSearch-enabled metadata parser. The data provider is presented with several sets of metadata forms accessed via the SHARE link on the main webpage. The main metadata tab allows him/her to register information about the data product itself, conforming to Dublin Core standards. Additionally, domain specific metadata tabs allow providers to specify details pertaining to Genetics, Environment, Management and Socioeconomic aspects.
- (F) How do I filter products to find what I want?
The Graphical User Interface under EXPLORE allows users to filter their searches by date, provider, geographic extent, and keywords
- (A) How can I get data into and out of the Platform?
The platform is highly Accessible. Small (< 2 Gb) data sets from your computer can be uploaded and downloaded with a simple button in the GUI for each task. For transferring large data sets, it is recommended to use Globus. Even files multiple terabytes in size can be transferred easily and robustly via Globus even in areas with poor network connectivity. Finally, those who know UNIX can use the secure copy command to transfer into and out of GEMS using a terminal in Jupyter.
- (A) How do I share data?
Put simply, “Your data, your choice.” The data provider is in complete control of who gets access to their data, and who even sees that a data set exists. The GEMS GUI provides easy controls on each data product. By default all data products are private to the provider. You can make your data totally open to all. Alternatively, all possible data pooling options are possible. So you can choose any set of individuals or teams to have any of 6 levels of access.
- (I) How can I assign ontology terms to column headers in my files?
GEMS has a standardization tool that automatically matches column headers of CSV and tab-delimited files to a set of agronomically-relevant internationally accepted ontologies. A user can peruse the automatic assignments and change each selected term. The collection of ontology terms are written to a JSON metadata file that accompanies the original input file.
- (I) How can I get a new ontology loaded into GEMS?
GEMS currently supports the following ontologies: TBD. An ontology is only useful if it has community-wide acceptance and use. Hence GEMS has a policy of not allowing ad hoc, unreviewed ontologies to be loaded. GEMS has a strong partnership with international ontology curators and can recommend individuals to contact to create a new ontology. If a user wants a specific community-accepted and published ontology in either OWL or RDF format, he/she can contact TBD to evaluate the ontology for its possible incorporation into GEMS.
- (R) How can I share a workflow?
A notebook within Jupyter can be placed in a folder with its accompanying files (e.g., metadata, images, input files) and shared as a data product just like any other data set. Once registered, it can be discoverable at the discretion of the registrant.
- (E) How can I include data use agreements for my products?
The GEMS Platform allows data providers to type in a box within the metadata entry form whatever use constraints the user of that product must adhere to. When another user attempts to download the product, he/she will be prompted to agree (or not) to the data use agreement via their explicit click-through acceptance. Non-acceptance denies access.
- (R) How can I version my data products?
Initially every data product is assigned a version (1). When users modify the metadata of the product, the version number is increased. Also, if a user adds a product that is meant to be a new version of an existing product, he/she can specify this in the metadata editor and a new version number is assigned (one higher than the current highest version for that product). All versions of a product are linked.