MLNanoShaper
Documentation for MLNanoShaper. MLNanoShaper is a machine learning algorithm that can compute the surface of proteins. There are multiple ways to interface with the software.
- As julia Modules
MLNanoShaper
andMLNanoShaperRunner
- As a cli command mlnanoshaper in
~/.julia/bin
. - As a training script
script/training.bash
that run multiple training runs. Requires parallel. - Running only: as a .so object.
MLNanoShaper.AccumulatorLogger
MLNanoShaper.AuxiliaryParameters
MLNanoShaper.LossType
MLNanoShaper.TrainingData
MLNanoShaper.TrainingParameters
MLNanoShaperRunner.AnnotedKDTree
MLNanoShaperRunner.ConcatenatedBatch
MLNanoShaperRunner.ModelInput
MLNanoShaperRunner.Option
MLNanoShaper._train
MLNanoShaper._train
MLNanoShaper.categorical_loss
MLNanoShaper.comonicon_install
MLNanoShaper.comonicon_install_path
MLNanoShaper.continus_loss
MLNanoShaper.generate_data
MLNanoShaper.generate_data_points
MLNanoShaper.implicit_surface
MLNanoShaper.load_data_pdb
MLNanoShaper.load_data_pqr
MLNanoShaper.train
MLNanoShaperRunner.batched_sum
MLNanoShaperRunner.distance
MLNanoShaperRunner.eval_model
MLNanoShaperRunner.evaluate_model
MLNanoShaperRunner.load_atoms
MLNanoShaperRunner.load_model
MLNanoShaperRunner.signed_distance
MLNanoShaperRunner.tiny_angular_dense
MLNanoShaper.AccumulatorLogger
— Typeaccumulator(processing,logger)
A processing logger that transform logger on multiple batches Ca be used to liss numerical data, for logging to TensorBoardLogger.
MLNanoShaper.AuxiliaryParameters
— TypeAuxiliaryParameters
The variables that do not influence the outome of the training run. This include the nb_epoch.
MLNanoShaper.LossType
— Typeabstract type LossType end
LossType is an interface for defining loss functions.
Implementation
- getlossfn(::LossType)::Function : the associated loss function
- metrictype(::Type{<:LossType)::Type : the type of metrics returned by the loss function
- getlosstype(::StaticSymbol)::LossType : the function generating the losstype
MLNanoShaper.TrainingData
— TypeTraining information used in model training.
Fields
atoms
: the set of atoms used as model inputskin
: the Surface generated by Nanoshaper
MLNanoShaper.TrainingParameters
— TypeTrainingParameters
The training parameters used in the model training. Default values are in the param file. The training is deterministric. Theses values are hased to determine a training run
MLNanoShaper._train
— Method_train(training_parameters::TrainingParameters, directories::AuxiliaryParameters)
train the model given TrainingParameters
and AuxiliaryParameters
.
MLNanoShaper._train
— Methodtrain((train_data,test_data),training_states; nb_epoch)
train the model on the data with nb_epoch
MLNanoShaper.categorical_loss
— Methodcategorical_loss(model, ps, st, (; point, atoms, d_real))
The loss function used by in training. Return the KL divergence between true probability and empirical probability Return the error with the espected distance as a metric.
MLNanoShaper.comonicon_install
— Methodcomonicon_install(;kwargs...)
Install the CLI manually. This will use the default configuration in Comonicon.toml
, if it exists. For more detailed reference, please refer to Comonicon documentation.
MLNanoShaper.comonicon_install_path
— Methodcomonicon_install_path(;[yes=false])
Install the PATH
and FPATH
to your shell configuration file. You can use comonicon_install_path(;yes=true)
to skip interactive prompt. For more detailed reference, please refer to Comonicon documentation.
MLNanoShaper.continus_loss
— Methodcontinus_loss(model, ps, st, (; point, atoms, d_real))
The loss function used by in training. compare the predicted (square) distance with $\frac{1 + anh(d)}{2}$ Return the error with the espected distance as a metric.
MLNanoShaper.generate_data
— Methodgenerate_data()
generate data from the parameters files in param/
by downloading the pdb files and running Nanoshaper.
MLNanoShaper.generate_data_points
— Methodgenerate_data_points(
preprocessing::Lux.AbstractExplicitLayer, points::AbstractVector{<:Point3},
(; atoms, skin)::TreeTrainingData{Float32}, (; ref_distance)::TrainingParameters)
generate the data_points for a set of positions points
on one protein.
MLNanoShaper.implicit_surface
— Methodimplicit_surface(atoms::AnnotedKDTree{Sphere{T}, :center, Point3{T}},
model::Lux.StatefulLuxLayer, (;
cutoff_radius, step)) where {T}
Create a mesh form the isosurface of function `pos -> model(atoms,pos)` using marching cubes algorithm and using step size `step`.
MLNanoShaper.load_data_pdb
— Methodload_data_pdb(T, name::String)
Load a TrainingData{T}
from current directory. You should have a pdb and an off file with name name
in current directory.
MLNanoShaper.load_data_pqr
— Methodload_data_pqr(T, name::String)
Load a TrainingData{T}
from current directory. You should have a pdb and an off file with name name
in current directory.
MLNanoShaper.train
— Functiontrain [options] [flags]
Train a model.
Intro
Train a model that can reconstruct a protein surface using Machine Learning. Default value of parameters are specified in the param/param.toml
file. In order to override the param, you can use the differents options.
Options
--nb-epoch <Int>
: the number of epoch to compute.--model, -m <String>
: the model name. Can be anakin.--nb-data-points <Int>
: the number of proteins in the dataset to use--name, -n <String>
: name of the training run--cutoff-radius, -c <Float32>
: the cutoff_radius used in training--ref-distance <Float32>
: the reference distane (in A) used to rescale distance to surface in loss--learning-rate, -l <Float64>
: the learning rate use by the model in training.--loss <String>
: the loss function, one of "categorical" or "continuous".
Flags
--gpu, -g
: should we do the training on the gpu, does nothing currently.
MLNanoShaperRunner.Option
— Typestate
The global state manipulated by the c interface. To use, you must first load the weights using load_weights
and the input atoms using load_atoms
. Then you can call eval_model to get the field on a certain point.
MLNanoShaperRunner.AnnotedKDTree
— TypeAnnotedKDTree(data::StructVector,property::StaticSymbol)
Fields
- data::StructVector
- tree::KDTree
MLNanoShaperRunner.ConcatenatedBatch
— TypeConcatenatedBatch
Represent a vector of arrays of sizes (a..., bn) where bn is the variable dimension of the batch. You can access view of individual arrays with get_slice
.
MLNanoShaperRunner.ModelInput
— TypeModelInput
input of the model
Fields
- point::Point3, the position of the input
- atoms::StructVector{Sphere}, the atoms in the neighboord
MLNanoShaperRunner.batched_sum
— Methodbatched_sum(b::AbstractMatrix,nb_elements::AbstractVector)
compute the sum of a Concatenated batch with ndim = 2. The first dim is the feature dimension. The second dim is the the batch dim.
Given b
of size (n,m) and nb_elements
of size (k,), the output has size (n,k).
MLNanoShaperRunner.distance
— Methoddistance(x::GeometryBasics.Mesh, y::KDTree)
Return the Hausdorff distance betwen the mesh coordinates
MLNanoShaperRunner.eval_model
— Functioneval_model(x::Float32,y::Float32,z::Float32)::Float32
evaluate the model at coordinates x
y
z
.
MLNanoShaperRunner.evaluate_model
— Methodevaluate_model(
model::Lux.StatefulLuxLayer, x::Point3f, atoms::AnnotedKDTree; cutoff_radius, default_value = -0.0f0)
evaluate the model on a single point.
This function handle the logic in case the point is too far from the atoms. In this case default_value is returned and the model is not run.
MLNanoShaperRunner.load_atoms
— Functionload_atoms(start::Ptr{CSphere},length::Cint)::Cint
Load the atoms into the julia model. Start is a pointer to the start of the array of CSphere
and length
is the length of the array
Return an error status:
- 0: OK
- 1: data could not be read
- 2: unknow error
MLNanoShaperRunner.load_model
— Functionload_model(path::String)::Cint
Load the model from a MLNanoShaperRunner.SerializedModel
serialized state at absolute path path
.
Return an error status:
- 0: OK
- 1: file not found
- 2: file could not be deserialized properly
- 3: unknow error
MLNanoShaperRunner.signed_distance
— Methodsigned_distance(p::Point3, mesh::RegionMesh)::Number
returns the signed distance between point p and the mesh
MLNanoShaperRunner.tiny_angular_dense
— Methodtiny_angular_dense(; categorical=false, van_der_waals_channel=false, kargs...)
`tiny_angular_dense` is a function that generate a lux model.