L'unité MIG et l'unité MIAJ ont fusionné au 1er janvier 2015. Elles constituent dorénavant la nouvelle unité MaIAGE dont le site internet est accessible via l'URL suivante : http://maiage.jouy.inra.fr.

Software

The methods developed by the MIG Unit are made available on-line to the wider community of biologists, bioanalysts and bioinformaticians through the MIG software listed below. Due to the computing power they require and the volume of data they process, these applications are not necessarily suitable for use on users' work stations.

Moreover, several projects for setting up Web Services are being developed in the unit, the most accomplished forming the framework of AGMIAL.

Here is a list of our main software :

  • AGMIAL is an integrated system for bacterial genome annotation. It is currently used at INRA for the newly sequenced bacterial genomes : Lactobacillus bulgaricus, Lactobacillus sakei and Flavobacterium psychrophilum, as well as the re-annotation of Lactococcus lactis, Enterococcus faecalis and faecium.
  • AlvisAE (Alvis Annotation Editor) is an on-line annotation editor for the collective edition and the visualisation of annotations of entities, relations and groups. It includes a workflow for annotation campaign management. The annotations of the text entities are defined in an ontology that can be revised in parallel. AlvisAE also includes a tool for detection and resolution of annotation conflicts.
    Part of this work has been funded by the European project Alvis and the French project Quaero. See Bossy et al., LAW VI 2012 for more details.
  • AlvisIR (Alvis Information Retrieval) is an on-line generic semantic search engine ; only few hours are needed to create a a new instance for a given document collection and an ontology. A user query with the ontology concepts retrieves all documents that contain the concepts, in the form of specific concepts, or synonyms. AlvisIR semantic search engine also handles relation queries. For example, search on biotopes of microorganisms . Part of this work has been funded by the European project Alvis and the French project Quaero.
  • Alvis NLP/ML is a pipeline that annotates text documents for the semantic annotation of textual documents. It integrates Natural Language Processing (NLP) tools for sentence and word segmentation, named-entity recognition, term analysis, semantic typing and relation extraction. These tools rely on resources such as terminologies or ontologies for the adaptation to the application domain. Alvis NLP/ML contains several tools for (semi)-automatic acquisition of these resources, using Machine Learning (ML) techniques. New components can be easily integrated into the pipeline. Part of this work has been funded by the European project Alvis and the French project Quaero. (See the paper by Nedellec et al. In Handbook on Ontologies 2009 for an overview)
  • AnovArray is a set of SAS subroutines for analysing microarray- and macroarray-type expressional data. It quantifies biological and technological variation sources and detects differentially expressed genes between several conditions. Statistical methods used are analysis of variance (ANOVA) and FDR method (False Discovery Rate) for calculating probabilities adjusted in a multiple hypotheses test framework.
  • BasyLiCA A user-friendly open-source interface and database dedicated to the automatic storage and standardized treatment of Live Cell Array data.
  • Beluga We developed a platform for indexing scientific literature ('Beluga') to enable the extraction of associations between features over periods of time. Beluga is proposing several modules based on the indexing of documents according to the following features : references, authors, terms, countries, keywords, sources, and institutions. The diachronic analysis of the corpus enables to describe the topic structure of the documents thanks to the underlying network of co-evolution between authors and terminology. For that purpose learning processes, scoring and visualization of the data are used. (download)
  • BioYaTeA is an extension of the YaTeA term extractor that deals with prepositional attachments and adjectival participle. It extracts terms from documents in French and in Eglish. Its distribution includes post-filtering of irrelevant terms. It is publicly available as CPAN module.
    Part of this work has been funded by the European project Alvis and the French project Quaero. See (Golik et al., CiCLING'2013) for more details.
  • Dynamocell allows the visualization of the metabolic pathway and its enzymatic and genetic regulations. It can also integrate the major available tools used for the analysis of the metabolic networks (contact Vincent Fromion).
  • ESAP (Extended Simulated Analysis Process) is a program for predicting loop conformation in proteins. It is based on a Monte-Carlo technique in the space of dihedral angles download.
  • FADO (Favored or Avoided Distances between Occurrences) allows to detect favored or avoided distances between occurrences of two motifs along sequences. It is available upon request (contact Sophie Schbath).
  • GOR IV is a program for predicting the secondary structure of proteins. 3 states are taken into consideration: alpha helix (H), beta strand (b) and aperiodical structures (C). This program is based on statistical considerations originating from the information theory. It does not use multiple alignment. It provides a Q3 result of 65% download.
  • GOR V is derived from GOR IV by introducing information from multiple amino acid sequence alignments using PSI-BLAST (Altschul et al. Nucl. Acids Res. 25, 3389, 1997). Its prediction accuracy, Q3, is reaching 73.5%.
  • hmmtiling This program implements the approach presented in our paper "Transcriptional landscape estimation from tiling array data using a model of signal shift and drift" (Nicolas et al., Bioinformatics, 2009). It takes as input the log intensities measured along the genome and it outputs an estimated transcriptional landscape with a prediction of the breakpoints (typically promoters and terminators).
  • ISLAND is a program which simulates the progress of a genome mapping project by the anchoring method. In particular, it provides the average number of contigs obtained, their average length and the average proportion of the genome covered by the contigs, according to genome length, clone and anchor number and clone length.
  • KAKSI is a program to assign protein secondary structure. The secondary structure assignment, alpha helix (H), beta strands (b), turns (T) and coils (c) is based on characteristic distances between alpha carbons and phi-psi angle values. The program also compute the curvature of the main chain download.
  • MuGeN (Multi-Genome Navigator) is an interactive tool enabling exploration in several annotated genomes completed by results of in silico analysis. It can also run in batch mode enabling it to generate images of various formats. This operating mode means that it can be integrated into Websites for displaying annotated physical maps. MuGeN is listed on the FreshMeat and Bioinformatics.Org portals.
  • OSS-HMM (Optimal Secondary Structure prediction Hidden Markov Model) is a software for secondary structure predictions (3 states, alpha helix, H, beta strand, b and coils, C) that is based on a hidden Markov model formalism. When it is used with a single sequence it provides a Q3 of 68.8%. When it is used with a multiple sequence alignment it provides a Q3 of 75.5%. This tool can also be used for generating protein sequences having a given secondary structure pattern download.
  • PCM (Pairwise Correlation Method). A Matlab program for the partition of a matrix of co-occurrence. This program is used in DOMIRE, “DOMain Identification from Recurrence” in proteins, see: Tai CH, Sam V, Gibrat JF, Garnier J, Munson PJ and Lee BK. Protein domain assignment from the recurrence of locally similar structures. PROTEINS: Structure, Function, and Bioinformatics, 2011; 79:853–866. download
  • RBA_B168 1.0 Computation of optimal resource distribution maximizing biomass formation with respect to extracellular medium for the Gram+ model bacterium Bacillus subtilis.
  • R'HOM (Research of HOMogeneous regions in DNA sequences) is software designed for the use of hidden Markov chain models for the segmentation of DNA sequences in homogeneous regions. R'HOM makes it possible to estimate a more realistic model of DNA sequence composition than a homogeneous Markov chain model and then to segment the sequence under this model. It has been used in particular to look for horizontal transfers in B. subtilis and to estimate models designed to calculate the significance of word counting. R'HOM has been developed in cooperation with the Laboratoire Statistique et Génome in Evry. It is free.
  • R'MES is a set of C++ programs devoted to the detection of motifs with an exceptional frequency in sequences (DNA, protein or other). It is freely available with a user guide and an online manual. R'MES has a companion tool, RMESPlot which is available at http://mulcyber.toulouse.inra.fr/projects/rmesplot and provides a graphical user interface for the visualization of R'MES generated results. It comes with its own user guide.
  • SHOW (Structure Homogeneities Watcher) is an adaptation of "R'HOM" which makes it possible to define with flexibility a complex hidden Markov chain model and then to use this model in various ways by implementing segmentation (forward-backward, Viterbi), estimation (EM) and simulation algorithms. Up until now, SHOW has been mainly used to predict bacterial genes but it has also been used with other objectives in mind such as splicing site detection in humans. In the future, it should facilitate developing models designed for studying numerous biological problems. SHOW has been developed in collaboration with the Laboratoire Statistique et Génome in Evry.
  • SIMPA (SIMilar Peptide Analysis) is a program for predicting the secondary structure of proteins. 3 states are taken into consideration: alpha helix (H), beta strand (b) and aperiodical structures (C). This program is based on the nearest neighbour notion. It provides a Q3 result of 67% download.
  • SMF (Symmetric Matrix Factorization). A Matlab program for the partition of a matrix of co-occurrence. This program is used in DOMIRE, “DOMain Identification from Recurrence” in proteins, see: Tai CH, Sam V, Gibrat JF, Garnier J, Munson PJ and Lee BK. Protein domain assignment from the recurrence of locally similar structures. PROTEINS: Structure, Function, and Bioinformatics, 2011; 79:853–866. download
  • svcR SvcR is an R package which takes a numerical matrix format as data input, and computes clusters using a support vector clustering method (SVC). We have implemented an original 2D-grid labeling approach to speed up the cluster extraction. In this sense, svc can be seen as an efficient cluster extraction if clusters are separable in a 2-D map. Secondly we showed that this SVC approach using a Jaccard-Radial base kernel can help to classify a set of terms into ontological classes and help to define regular expression rules for extracting information from the documents. The case study concerns a set of terms and documents about developmental and molecular biology. download
  • SVD (Singular Vector Decomposition). A Matlab program for the partition of a matrix of co-occurrence. This program is used in DOMIRE, “DOMain Identification from Recurrence” in proteins, see: Tai CH, Sam V, Gibrat JF, Garnier J, Munson PJ and Lee BK. Protein domain assignment from the recurrence of locally similar structures. PROTEINS: Structure, Function, and Bioinformatics, 2011; 79:853–866. download
  • treemm This program is dedicated to unsupervised clustering of bacterial promoter sequences. It is based on the modelling of distinct classes of bipartite motifs designed to represent binding sites of different Sigma factors. It allows to account for the non-random distribution of such motifs across a tree aimed at summarizing the correlation between promoter ativity profiles. The approach was described in our paper "Condition-Dependent Transcriptome Reveals High-Level Regulatory Architecture in Bacillus subtilis" (Nicolas et al., Science, 2012).
  • TyDI, (Terminology Design Interface) is a collaborative tool for the manual validation and structuring of terms either originating from terminologies or extracted from training corpus of textual documents. It is used on the output of so-called term extractor programs (like BioYatea), which are used to identify candidates terms (e.g. compound nouns). With TyDI, a user can validate candidate terms and specify synonymy/hyperonymy relations. These annotations can then be exported in several formats, and used in other natural language processing tools. Part of this work has been funded by the French project Quaero. More details (Golik et al., Ekaw 2010 ).
  • VAST (Vector Alignment Search Tool) is a progam for comparing protein 3D structures. Download here