StructSVM-ADMM: Software for ADMM for Sparse Structural SVMs with augmented L1 regularizers

This page contains information about the software, StructSVM-ADMM, which contains the implementation of ADMM for Sparse Structural SVMs with augmented L1 regularizers.

ADMM for Structural SVMs with augmented L1 regularizers

Various mining tasks involve prediction of highly specialized and structured objects like graphs, trees and sequences. Structural SVMs have become a powerful alternative for learning and efficiently predicting such structured objects. Structural SVMs with augmented L1 regularizers help in obtaining sparse models. Sparse models lead to feature identification, extraction and also aid in reducing the complexity of computationally intensive inference procedures.

Given a sample of structured inputs and outputs {(xi,yi)}i=1,2,..n (X x Y), Structural SVMs with augmented L1 regularizers solve an optimization problem:

minw,ξ β1||w||1+ β22 (||w||2)2 + C Σi=1,2,…,n ξi

s.t. wT Δfi(y) Li(y) - ξi, i, yY.

Δfi(y)=f(xi,yi) - f(xi,y) is the difference feature vector and Li(y)=L(yi,y) is a suitable function capturing the dissimilarity between the two outputs yi and y.

When β1=1 and β2=zero we recover L1 regularized structural SVM. When β1 and β2 are non-zero we recover elastic net regularized structural SVM problem.

The designed ADMM solves a sequence of three problems one of which makes use of sequential dual method developed for training L2 regularized structural SVM while the other two problems have easy-to-compute closed-form solutions.

Structural SVMs are used in various applications like sequence labeling, parse-tree classification, image segmentation etc.,

seqlabeling parsetree

Download

You can download the current version of StructSVM-ADMM for Sparse Structural SVMs from the link struct_svm_admm.zip

This software is available free only for non-commercial purposes. Please cite the paper mentioned below if you intend to use this program.

How to Cite

Please cite the following paper :

ADMM for Training Sparse Structural SVMs with Augmented L1 regularizers.
P.Balamurugan, Anusha Posinasetty, Shirish Shevade.
In SIAM International Conference on Data Mining , 2016.
Software available at http://clweb.csa.iisc.ernet.in/anu4iisc/structsvm_admm.html

The bibtex for the paper is :

@article{StructSVMADMM,
title = {ADMM for Training Sparse Structural SVMs with Augmented $\ell_1$ regularizers},
author = {P. Balamurugan and Anusha Posinasetty and Shirish Shevade},
journal = {Proceedings of the Sixteenth SIAM International Conference on Data Mining},
year = {2016},
note = {Software available at \url{http://clweb.csa.iisc.ernet.in/anu4iisc/structsvm_admm.html}}
}

Acknowledgment

The implementation uses the SVMStruct API used in Sequence Tagging and available at http://www.cs.cornell.edu/People/tj/svm_light/svm_hmm.html.

From SVMStruct implementation, the following files have been borrowed with minor modifications :

1. svm_common.h

2. svm_common.c

3. svm_learn.h

4. svm_struct_api.c

5. svm_struct_api.h

6. svm_struct_api_types.h

How To Use

  1. Unzip the archive struct_svm_admm.zip using the command

    unzip struct_svm_admm.zip

    under some path /USER_PATH (this depends on your machine).

  2. Under the path /USER_PATH, a directory named struct_svm_admm is created. The full path of this directory is /USER_PATH/struct_svm_admm/.

  3. Open a shell window.

  4. Change directory using the following command on the shell prompt

    cd /USER_PATH/struct_svm_admm/

  5. Type the following commands on the prompt

    make clean

    make

  6. The executables structsvm_sdm_learn and structsvm_sdm_classify are created under the directory /USR_PATH/struct_svm_admm/

  7. Type the following in the shell prompt to export PATH variable:

    export PATH=$PATH:/USER_PATH/struct_svm_admm/

  8. Running the program:

    For learning/training phase, type the following in the shell prompt

    ./structsvm_sdm_learn [options] train_file model_file test_file

    Note

    1. options are given below. If you do not provide any options, the default parameters will be used.
    2. train_file, model_file and test_file are to be specified with the full path
    3. model_file is an optional second argument and if a model file is not specified, the model gets written to a default file named structsvm_sdm_struct_model in the struct_svm_admm directory.
    4. test_file is an optional third argument, and if a test file is not specified, no testing is done. Please note that test file cannot be given as the second argument.

    To know about learn/train options, please type

    structsvm_sdm_learn -?

    The options available are given below :

    General Options:

        -?   → help

        -v [0..3]   → verbosity level (default 1)

    Learning Options:

        -c float    → C: trade-off between training error and margin (default 0.1)

        -l [1]       → Loss function to use. (default 1)

                         1: zero/one loss per token (Hamming Loss)

                        (user could incorporate zero/one loss per sequence)

        -w {0,1,2,3,4,5,6,7}     → choice of learning algorithm (default 1):

    • 0: baseline SDM algorithm without heuristics for L2 regularized struct SVM
    • 1: SDM Algorithm with heuristics for L2 regularized struct SVM
    • 2: L1 ADMM
    • 3: Elastic Net ADMM
    • 4: L1 SGD
    • 5: Elastic Net using Sequential Alternating Proximal method
    • 6: Elastic Net using SGD
    • 7: L1 Cutting plane method(CPLEX)

    SDM Optimization Options:

        -e float    → max Psi tol : tolerance for SDM algorithm termination (default 0.25)

        -k [1..]    → kappa : No of first GetMaxY iterations (default 10)

        -n [1..]    → Non-GetMaxY cap : Interleave GetMaxY iterations with these many non-GetMaxY iterations (default 5)

        -M [1..]    → Max no of SDM iterations (default 100000)

  9. ADMM METHOD SPECIFIC OPTIONS

        -r float   → ADMM penalty parameter or rho(default 10)

        -# float   → Max ADMM iterations (default 100000)

    ELASTIC NET METHOD SPECIFIC OPTIONS

        -1 float   → regularization parameter corresponding to l1 term or beta_1(default 0.9)

        -2 float   → regularization parameter corresponding to l2 term or beta_2(default 0.1)

    SGD SPECIFIC OPTIONS

        -3 float    → eta0 learning rate parameter for SGD(default 0.9)

    SMO Optimization Options:

        -s float    → Max SMO tolerance : Terminate SMO on achieving this tolerance (default 0.15)

        -$ [1..]    → Max no of SMO iterations (default 100000)

Running ADMM on example Data

The struct_svm_admm directory contains a sub-folder data in which two sample data files sampledata and OCR200 are given.

You can try running L1 ADMM on this data by typing the following command:

./structsvm_sdm_learn -c 1 -w 2 data/sampledata

You should get a primal objective value ≈ 123.10923143.

The performance and sparsity of the model obtained on the sample train data using L1 ADMM is given in the graphs below:

train perf l1_admm_sampledata sparsity l1_admm_sampledata

You can try running Elastic Net ADMM on the sampledata by typing the following command:

./structsvm_sdm_learn -c 1 -w 3 data/sampledata

You should get a primal objective value ≈ 119.94880210.

The performance and sparsity of the model obtained on the sample train data using Elastic Net ADMM is given in the graphs below:

train perf enet_sampledata sparsity enet_sampledata

You can try running similar experiments on the OCR200 by typing the following commands:

./structsvm_sdm_learn -c 1 -w 2 data/OCR200

./structsvm_sdm_learn -c 1 -w 3 data/OCR200

For L1 ADMM you should get a primal objective value ≈ 430.93739342.

For Elastic Net ADMM you should get a primal objective value ≈ 409.27678099.

The performance and sparsity of the model obtained on OCR200 using L1 ADMM and Elastic Net ADMM are given in the graphs below:

train perf l1_admm_OCR200 sparsity l1_admm_OCR200

train perf enet_OCR200 sparsity enet_OCR200

Description of feature vector used in the implementation

For sequence labeling application, the input is made of parts x=(x1,x2,…,xT) and the output is also made of corresponding parts y=(y1,y2,…,yT). The feature vector for the input (x,y) given by f(x,y) is constructed using the first-order features and second-order features. The first-order features are obtained by placing the features xt at yt-th position. The second-order features are obtained by considering the label assignment to the label pair yt-yt-1 at every stage t=1,2,…,T.

Input File Format

A). The input file format should be of SVMStruct format.

example :
4 qid:1 5:-1 345:0.3 2101:0.89
6 qid:1 1:-1.5 5:0.2 39:1.98
3 qid:2 567:-0.24 689:0.8 792:0.21 

In case your data has the CoNLL text chunking format, please go to Step. B.

If the input file format is different from these formats, then the reader module has to be changed.

All other modules which assume this format should also be changed.

B). CoNLL text chunking data has the following form :

But CC O
we PRP B-NP
certainly RB O
like IN O
what WP B-NP
we PRP B-NP
've VBP O
seen VBN O
so RB O
far RB O
. . O
" " O 

This data has a column-format (number of columns=3) and a suitable template file is required to extract features from this data.

The template file contains template rules of the following form :

U00:%x[-2,0] U01:%x[-1,0] U02:%x[0,0] U03:%x[1,0] U04:%x[2,0] U05:%x[-1,0]/%x[0,0] U06:%x[0,0]/%x[1,0] U10:%x[-2,1] U11:%x[-1,1] U12:%x[0,1] U13:%x[1,1] U14:%x[2,1] U15:%x[-2,1]/%x[-1,1] U16:%x[-1,1]/%x[0,1] U17:%x[0,1]/%x[1,1] U18:%x[1,1]/%x[2,1] U20:%x[-2,1]/%x[-1,1]/%x[0,1] U21:%x[-1,1]/%x[0,1]/%x[1,1] U22:%x[0,1]/%x[1,1]/%x[2,1]

B

Similar templates are available for other Bio-datasets like BioCreative and BioNLP, the data of which is also in the column format.

For such datasets, please use the utility conll2svmlightconverter by typing the following in the shell prompt:

conll2svmlightconverter template_file train_file test_file validation_file

This will extract the features in SVMStruct format in a new file, the name for which is displayed in the shell prompt.

To know more about this utility, just type conll2svmlightconverter in the shell prompt.

Once you get the features in SVMStruct format, you can start training as described in Step. 8.

The conll2svmlightconverter tool is modelled similar to the feature generation procedure in CRFSGD by Leon Bottou.