Experimental Data Parsers

clinicalParser.py

parser(projectId)[source]
project_parser(projectId, config, directory, separator)[source]
experimental_design_parser(projectId, config, directory)[source]
clinical_parser(projectId, config, directory, separator)[source]
parse_dataset(projectId, configuration, dataDir, key='project')[source]

This function parses clinical data from subjects in the project Input: uri of the clinical data file. Format: Subjects as rows, clinical variables as columns Output: pandas DataFrame with the same input format but the clinical variables mapped to the right ontology (defined in config), i.e. type = -40 -> SNOMED CT

extract_project_info(project_data)[source]
extract_responsible_rels(project_data, separator='|')[source]
extract_participant_rels(project_data, separator='|')[source]
extract_project_tissue_rels(project_data, separator='|')[source]
extract_project_disease_rels(project_data, separator='|')[source]
extract_project_intervention_rels(project_data, separator='|')[source]
extract_project_rels(project_data, separator='|')[source]
extract_timepoints(project_data, separator='|')[source]
extract_project_subject_rels(projectId, design_data)[source]
extract_subject_identifiers(design_data)[source]
extract_biosample_identifiers(design_data)[source]
extract_analytical_sample_identifiers(design_data)[source]
extract_biological_sample_subject_rels(design_data)[source]
extract_biological_sample_analytical_sample_rels(design_data)[source]
extract_biological_samples_info(clinical_data)[source]
extract_analytical_samples_info(clinical_data)[source]
extract_biosample_analytical_sample_relationship_attributes(clinical_data)[source]
extract_biological_sample_timepoint_rels(clinical_data)[source]
extract_biological_sample_tissue_rels(clinical_data)[source]
extract_subject_disease_rels(clinical_data, separator='|')[source]
extract_subject_intervention_rels(clinical_data, separator='|')[source]
extract_biological_sample_group_rels(clinical_data)[source]
extract_biological_sample_clinical_variables_rels(clinical_data)[source]

proteomicsParser.py

parser(projectId, directory=None)[source]
parse_from_directory(projectId, directory, configuration=None)[source]
parser_from_file(file_path, configuration, data_type, is_standard=True)[source]
get_configuration(processing_tool, data_type)[source]
update_configuration(data_type, processing_tool, value_col='LFQ intensity', columns=[], drop_cols=[], filters=None, new_config={})[source]
parse_dataset(filepath, configuration)[source]
parse_standard_dataset(file_path, configuration)[source]
check_columns(data, req_columns, generated_columns)[source]
check_minimum_configuration(configuration)[source]
load_dataset(uri, configuration)[source]

This function gets the molecular data from a proteomics experiment. Input: uri of the processed file resulting from MQ Output: pandas DataFrame with the columns and filters defined in config.py

remove_contaminant_tag(column, tag='CON__')[source]
expand_groups(data, configuration)[source]
extract_modification_protein_rels(data, configuration)[source]
extract_protein_modification_subject_rels(data, configuration)[source]
extract_protein_protein_modification_rels(data, configuration)[source]
extract_peptide_protein_modification_rels(data, configuration)[source]
extract_protein_modifications_rels(data, configuration)[source]
extract_protein_modifications_modification_rels(data, configuration)[source]
extract_peptides(data, configuration)[source]
extract_peptide_subject_rels(data, configuration)[source]
extract_peptide_protein_rels(data, configuration)[source]
extract_protein_subject_rels(data, configuration)[source]
get_value_cols(data, configuration)[source]
extract_subject_replicates_from_regex(data, regex)[source]
extract_subject_replicates(data, value_cols)[source]
extract_attributes(data, attributes)[source]
merge_regex_attributes(data, attributes, index, regexCols)[source]
merge_col_attributes(data, attributes, index)[source]
calculate_median_replicates(data, log='log2')[source]
update_groups(data, groups)[source]
get_dataset_configuration(processing_format, data_type)[source]

wesParser.py

parser(projectId)[source]
parseWESDataset(projectId, configuration, dataDir)[source]
loadWESDataset(uri, configuration)[source]

This function gets the molecular data from a Whole Exome Sequencing experiment. Input: uri of the processed file resulting from the WES analysis pipeline. The resulting Annovar annotated VCF file from Mutect (sampleID_mutect_annovar.vcf) Output: pandas DataFrame with the columns and filters defined in config.py

extractWESRelationships(data, configuration)[source]