CKG Builder¶
create_user.py¶
-
create_user_node(driver, data)[source]¶ Creates graph database node for new user and adds respective properties to node.
- Parameters
driver (py2neo driver) – py2neo driver, which provides the connection to the neo4j graph database.
data (Series) – pandas Series with new user identifier and required user information (see set_arguments()).
-
create_user_from_command_line(args, expiration)[source]¶ Creates new user in the graph database and corresponding node, from a terminal window (command line), and adds the new user information to the users excel and import files. Arguments as in set_arguments().
- Parameters
args (any object with __dict__ attribute) – object. Contains all the parameters neccessary to create a user (‘username’, ‘name’, ‘email’, ‘secondary_email’, ‘phone_number’ and ‘affiliation’).
expiration (int) – number of days users is given access.
Note
This function can be used directly with python create_user_from_command_line.py -u username -n user_name -e email -s secondary_email -p phone_number -a affiliation .
-
create_user_from_file(filepath, expiration)[source]¶ Creates new user in the graph database and corresponding node, from an excel file. Rows in the file must be users, and columns must follow set_arguments() fields.
- Parameters
Note
This function can be used directly with python create_user_from_file.py -f path_to_file .
-
create_user(data, output_file, expiration=365)[source]¶ Creates new user in the graph database and corresponding node, through the following steps:
Checks if a user with given properties already exists in the database. If not:
Generates new user identifier
Creates new local user (access to graph database)
Creates new user node
Saves data to users.tsv
importer.py¶
Generates all the import files: Ontologies, Databases and Experiments. The module is reponsible for generating all the csv files that will be loaded into the Graph database and also updates a stats object (hdf table) with the number of entities and relationships from each dataset imported. A new stats object is created the first time a full import is run.
-
ontologiesImport(importDirectory, ontologies=None, download=True, import_type='partial')[source]¶ Generates all the entities and relationships from the provided ontologies. If the ontologies list is not provided, then all the ontologies listed in the configuration will be imported (full_import). This function also updates the stats object with numbers from the imported ontologies.
-
databasesImport(importDirectory, databases=None, n_jobs=1, download=True, import_type='partial')[source]¶ Generates all the entities and relationships from the provided databases. If the databases list is not provided, then all the databases listed in the configuration will be imported (full_import). This function also updates the stats object with numbers from the imported databases.
- Parameters
-
experimentsImport(projects=None, n_jobs=1, import_type='partial')[source]¶ Generates all the entities and relationships from the specified Projects. If the projects list is not provided, then all the projects the experiments directory will be imported (full_import). Calls function experimentImport.
-
experimentImport(importDirectory, experimentsDirectory, project)[source]¶ Generates all the entities and relationships from the specified Project. Called from function experimentsImport.
-
usersImport(importDirectory, import_type='partial')[source]¶ Generates User entities from excel file and grants access of new users to the database. This function also writes the relevant information to a tab-delimited file in the import directory.
-
fullImport(download=True, n_jobs=4)[source]¶ Calls the different importer functions: Ontologies, databases, experiments. The first step is to check if the stats object exists and create it otherwise. Calls setupStats.
-
generateStatsDataFrame(stats)[source]¶ Generates a dataframe with the stats from each import. :param list stats: a list with statistics collected from each importer function. :return: Pandas dataframe with the collected statistics.
-
setupStats(import_type)[source]¶ Creates a stats object that will collect all the statistics collected from each import.
-
createEmptyStats(statsCols, statsFile, statsName)[source]¶ Creates a HDFStore object with a empty dataframe with the collected stats columns.
loader.py¶
Populates the graph database with all the files generated by the importer.py module: Ontologies, Databases and Experiments. The module loads all the entities and relationships defined in the importer files. It calls Cypher queries defined in the cypher.py module. Further, it generates an hdf object with the number of enities and relationships loaded for each Database, Ontology and Experiment. This module also generates a compressed backup file of all the loaded files.
There are two types of updates:
Full: all the entities and relationships in the graph database are populated
Partial: only the specified entities and relationships are loaded
The compressed files for each type of update are named accordingly and saved in the archive/ folder in data/.
-
load_into_database(driver, queries, requester)[source]¶ This function runs the queries provided in the graph database using a py2neo driver.
-
updateDB(driver, imports=None, specific=[])[source]¶ Populates the graph database with information for each Database, Ontology or Experiment specified in imports. If imports is not defined, the function populates the entire graph database based on the graph variable defined in the grapher_config.py module. This function also updates the graph stats object with numbers from the loaded entities and relationships.
- Parameters
driver (py2neo driver) – py2neo driver, which provides the connection to the neo4j graph database.
imports (list) – a list of entities to be loaded into the graph.
-
fullUpdate()[source]¶ Main method that controls the population of the graph database. Firstly, it gets a connection to the database (driver) and then initiates the update of the entire database getting all the graph entities to update from configuration. Once the graph database has been populated, the imports folder in data/ is compressed and archived in the archive/ folder so that a backup of the imports files is kept (full).
-
partialUpdate(imports, specific=[])[source]¶ Method that controls the update of the graph database with the specified entities and relationships. Firstly, it gets a connection to the database (driver) and then initiates the update of the specified graph entities. Once the graph database has been populated, the data files uploaded to the graph are compressed and archived in the archive/ folder (partial).
- Parameters
imports (list) – list of entities to update
-
archiveImportDirectory(archive_type='full')[source]¶ This function creates the compressed backup imports folder with either the whole folder (full update) or with only the files uploaded (partial update). The folder or files are compressed into a gzipped tarball file and stored in the archive/ folder defined in the configuration.
- Parameters
archive_type (str) – whether it is a full update or a partial update.
builder.py¶
Builds the database in two main steps:
Imports all the data from ontologies, databases and experiments
Loads these data into the database
The module can perform full updates, executing both steps for all the ontologies, databases and experiments or a partial update. Partial updates can execute step 1 or step 2 for specific data.