MLNanoShaper

Documentation for MLNanoShaper. MLNanoShaper is a machine learning algorithm that can compute the surface of proteins. There are multiple ways to interface with the software.

As julia Modules MLNanoShaper and MLNanoShaperRunner
As a cli command mlnanoshaper in ~/.julia/bin.
As a training script script/training.bash that run multiple training runs. Requires parallel.
Running only: as a .so object.

MLNanoShaper.AccumulatorLogger
MLNanoShaper.AuxiliaryParameters
MLNanoShaper.LossType
MLNanoShaper.TrainingData
MLNanoShaper.TrainingParameters
MLNanoShaperRunner.AnnotedKDTree
MLNanoShaperRunner.ConcatenatedBatch
MLNanoShaperRunner.ModelInput
MLNanoShaperRunner.Option
MLNanoShaper._train
MLNanoShaper._train
MLNanoShaper.categorical_loss
MLNanoShaper.comonicon_install
MLNanoShaper.comonicon_install_path
MLNanoShaper.continus_loss
MLNanoShaper.generate_data
MLNanoShaper.generate_data_points
MLNanoShaper.implicit_surface
MLNanoShaper.load_data_pdb
MLNanoShaper.load_data_pqr
MLNanoShaper.train
MLNanoShaperRunner.batched_sum
MLNanoShaperRunner.distance
MLNanoShaperRunner.eval_model
MLNanoShaperRunner.evaluate_model
MLNanoShaperRunner.load_atoms
MLNanoShaperRunner.load_model
MLNanoShaperRunner.signed_distance
MLNanoShaperRunner.tiny_angular_dense

MLNanoShaper.AccumulatorLogger — Type

accumulator(processing,logger)

A processing logger that transform logger on multiple batches Ca be used to liss numerical data, for logging to TensorBoardLogger.

source

MLNanoShaper.AuxiliaryParameters — Type

AuxiliaryParameters

The variables that do not influence the outome of the training run. This include the nb_epoch.

source

MLNanoShaper.LossType — Type

abstract type LossType end

LossType is an interface for defining loss functions.

Implementation

getlossfn(::LossType)::Function : the associated loss function
metrictype(::Type{<:LossType)::Type : the type of metrics returned by the loss function
getlosstype(::StaticSymbol)::LossType : the function generating the losstype

source

MLNanoShaper.TrainingData — Type

Training information used in model training.

Fields

atoms: the set of atoms used as model input
skin : the Surface generated by Nanoshaper

source

MLNanoShaper.TrainingParameters — Type

TrainingParameters

The training parameters used in the model training. Default values are in the param file. The training is deterministric. Theses values are hased to determine a training run

source

MLNanoShaper._train — Method

_train(training_parameters::TrainingParameters, directories::AuxiliaryParameters)

train the model given TrainingParameters and AuxiliaryParameters.

source

MLNanoShaper._train — Method

train((train_data,test_data),training_states; nb_epoch)

train the model on the data with nb_epoch

source

MLNanoShaper.categorical_loss — Method

categorical_loss(model, ps, st, (; point, atoms, d_real))

The loss function used by in training. Return the KL divergence between true probability and empirical probability Return the error with the espected distance as a metric.

source

MLNanoShaper.comonicon_install — Method

comonicon_install(;kwargs...)

Install the CLI manually. This will use the default configuration in Comonicon.toml, if it exists. For more detailed reference, please refer to Comonicon documentation.

source

MLNanoShaper.comonicon_install_path — Method

comonicon_install_path(;[yes=false])

Install the PATH and FPATH to your shell configuration file. You can use comonicon_install_path(;yes=true) to skip interactive prompt. For more detailed reference, please refer to Comonicon documentation.

source

MLNanoShaper.continus_loss — Method

continus_loss(model, ps, st, (; point, atoms, d_real))

The loss function used by in training. compare the predicted (square) distance with $\frac{1 + anh(d)}{2}$ Return the error with the espected distance as a metric.

source

MLNanoShaper.generate_data — Method

generate_data()

generate data from the parameters files in param/ by downloading the pdb files and running Nanoshaper.

source

MLNanoShaper.generate_data_points — Method

generate_data_points(
    preprocessing::Lux.AbstractExplicitLayer, points::AbstractVector{<:Point3},
    (; atoms, skin)::TreeTrainingData{Float32}, (; ref_distance)::TrainingParameters)

generate the data_points for a set of positions points on one protein.

source

MLNanoShaper.implicit_surface — Method

implicit_surface(atoms::AnnotedKDTree{Sphere{T}, :center, Point3{T}},
    model::Lux.StatefulLuxLayer, (;
        cutoff_radius, step)) where {T}

Create a mesh form the isosurface of function `pos -> model(atoms,pos)` using marching cubes algorithm and using step size `step`.

source

MLNanoShaper.load_data_pdb — Method

load_data_pdb(T, name::String)

Load a TrainingData{T} from current directory. You should have a pdb and an off file with name name in current directory.

source

MLNanoShaper.load_data_pqr — Method

load_data_pqr(T, name::String)

Load a TrainingData{T} from current directory. You should have a pdb and an off file with name name in current directory.

source

MLNanoShaper.train — Function

train [options] [flags]

Train a model.

Intro

Train a model that can reconstruct a protein surface using Machine Learning. Default value of parameters are specified in the param/param.toml file. In order to override the param, you can use the differents options.

Options

--nb-epoch <Int>: the number of epoch to compute.
--model, -m <String>: the model name. Can be anakin.
--nb-data-points <Int>: the number of proteins in the dataset to use
--name, -n <String>: name of the training run
--cutoff-radius, -c <Float32>: the cutoff_radius used in training
--ref-distance <Float32>: the reference distane (in A) used to rescale distance to surface in loss
--learning-rate, -l <Float64>: the learning rate use by the model in training.
--loss <String>: the loss function, one of "categorical" or "continuous".

Flags

--gpu, -g: should we do the training on the gpu, does nothing currently.

source

MLNanoShaperRunner.Option — Type

state

The global state manipulated by the c interface. To use, you must first load the weights using load_weights and the input atoms using load_atoms. Then you can call eval_model to get the field on a certain point.

source

MLNanoShaperRunner.AnnotedKDTree — Type

AnnotedKDTree(data::StructVector,property::StaticSymbol)

Fields

data::StructVector
tree::KDTree

source

MLNanoShaperRunner.ConcatenatedBatch — Type

ConcatenatedBatch

Represent a vector of arrays of sizes (a..., bn) where bn is the variable dimension of the batch. You can access view of individual arrays with get_slice.

source

MLNanoShaperRunner.ModelInput — Type

ModelInput

input of the model

Fields

point::Point3, the position of the input
atoms::StructVector{Sphere}, the atoms in the neighboord

source

MLNanoShaperRunner.batched_sum — Method

batched_sum(b::AbstractMatrix,nb_elements::AbstractVector)

compute the sum of a Concatenated batch with ndim = 2. The first dim is the feature dimension. The second dim is the the batch dim.

Given b of size (n,m) and nb_elements of size (k,), the output has size (n,k).

source

MLNanoShaperRunner.distance — Method

distance(x::GeometryBasics.Mesh, y::KDTree)

Return the Hausdorff distance betwen the mesh coordinates

source

MLNanoShaperRunner.eval_model — Function

eval_model(x::Float32,y::Float32,z::Float32)::Float32

evaluate the model at coordinates x y z.

source

MLNanoShaperRunner.evaluate_model — Method

evaluate_model(
    model::Lux.StatefulLuxLayer, x::Point3f, atoms::AnnotedKDTree; cutoff_radius, default_value = -0.0f0)

evaluate the model on a single point.
This function handle the logic in case the point is too far from the atoms. In this case default_value is returned and the model is not run.

source

MLNanoShaperRunner.load_atoms — Function

load_atoms(start::Ptr{CSphere},length::Cint)::Cint

Load the atoms into the julia model. Start is a pointer to the start of the array of CSphere and length is the length of the array

Return an error status:

0: OK
1: data could not be read
2: unknow error

source

MLNanoShaperRunner.load_model — Function

load_model(path::String)::Cint

Load the model from a MLNanoShaperRunner.SerializedModel serialized state at absolute path path.

Return an error status:

0: OK
1: file not found
2: file could not be deserialized properly
3: unknow error

source

MLNanoShaperRunner.signed_distance — Method

signed_distance(p::Point3, mesh::RegionMesh)::Number

returns the signed distance between point p and the mesh

source

MLNanoShaperRunner.tiny_angular_dense — Method

tiny_angular_dense(; categorical=false, van_der_waals_channel=false, kargs...)

`tiny_angular_dense` is a function that generate a lux model.

source