This page contains information about the software, StructSVM-ADMM, which contains the implementation of ADMM for Sparse Structural SVMs with augmented L1 regularizers.
Various mining tasks involve prediction of highly specialized and structured objects like graphs, trees and sequences. Structural SVMs have become a powerful alternative for learning and efficiently predicting such structured objects. Structural SVMs with augmented L1 regularizers help in obtaining sparse models. Sparse models lead to feature identification, extraction and also aid in reducing the complexity of computationally intensive inference procedures.
Given a sample of structured inputs and outputs {(xi,yi)}i=1,2,..n ∈ (X x Y), Structural SVMs with augmented L1 regularizers solve an optimization problem:
minw,ξ β1||w||1+ β2⁄2 (||w||2)2 + C Σi=1,2,…,n ξi
s.t. wT Δfi(y)≥ Li(y) - ξi, ∀ i, ∀ y∈Y.
Δfi(y)=f(xi,yi) - f(xi,y) is the difference feature vector and Li(y)=L(yi,y) is a suitable function capturing the dissimilarity between the two outputs yi and y.
When β1=1 and β2=zero we recover L1 regularized structural SVM. When β1 and β2 are non-zero we recover elastic net regularized structural SVM problem.
The designed ADMM solves a sequence of three problems one of which makes use of sequential dual method developed for training L2 regularized structural SVM while the other two problems have easy-to-compute closed-form solutions.
Structural SVMs are used in various applications like sequence labeling, parse-tree classification, image segmentation etc.,
You can download the current version of StructSVM-ADMM for Sparse Structural SVMs from the link struct_svm_admm.zip
This software is available free only for non-commercial purposes. Please cite the paper mentioned below if you intend to use this program.
Please cite the following paper :
ADMM for Training Sparse Structural SVMs with Augmented L1 regularizers.
P.Balamurugan, Anusha Posinasetty, Shirish Shevade.
In SIAM International Conference on Data Mining , 2016.
Software available at http://clweb.csa.iisc.ernet.in/anu4iisc/structsvm_admm.html
The bibtex for the paper is :
@article{StructSVMADMM, title = {ADMM for Training Sparse Structural SVMs with Augmented $\ell_1$ regularizers}, author = {P. Balamurugan and Anusha Posinasetty and Shirish Shevade}, journal = {Proceedings of the Sixteenth SIAM International Conference on Data Mining}, year = {2016}, note = {Software available at \url{http://clweb.csa.iisc.ernet.in/anu4iisc/structsvm_admm.html}} }
The implementation uses the SVMStruct API used in Sequence Tagging and available at http://www.cs.cornell.edu/
From SVMStruct implementation, the following files have been borrowed with minor modifications :
1. svm_common.h
2. svm_common.c
3. svm_learn.h
4. svm_struct_api.c
5. svm_struct_api.h
6. svm_struct_api_types.h
Please follow the given steps to use the software.
Unzip the archive struct_svm_admm.zip using the command
unzip struct_svm_admm.zip
under some path /USER_PATH (this depends on your machine).
Under the path /USER_PATH, a directory named struct_svm_admm is
created. The full path of this directory is /USER_PATH/struct_svm_admm/.
Open a shell window.
Change directory using the following command on the shell prompt
cd /USER_PATH/struct_svm_admm/
Type the following commands on the prompt
make clean
make
The executables structsvm_sdm_learn and structsvm_sdm_classify are
created under the directory /USR_PATH/struct_svm_admm/
Type the following in the shell prompt to export PATH variable:
export PATH=$PATH:/USER_PATH/struct_
Running the program:
For learning/training phase, type the following in the shell prompt
./structsvm_sdm_learn [options] train_file model_file test_file
Note
To know about learn/train options, please type
structsvm_sdm_learn -?
The options available are given below :
General Options:
-? → help
-v [0..3] → verbosity level (default 1)
Learning Options:
-c float → C: trade-off between training error and margin (default 0.1)
-l [1] → Loss function to use. (default 1)
1: zero/one loss per token (Hamming Loss)
(user could incorporate zero/one loss per
sequence)
-w {0,1,2,3,4,5,6,7} → choice of learning algorithm (default 1):
SDM Optimization Options:
-e float → max Psi tol : tolerance for SDM algorithm termination (default 0.25)
-k [1..] → kappa : No of first GetMaxY iterations (default 10)
-n [1..] → Non-GetMaxY
cap : Interleave GetMaxY iterations with these many non-GetMaxY
iterations (default 5)
-M [1..] → Max no of SDM iterations (default 100000)
ADMM METHOD SPECIFIC OPTIONS
-r float → ADMM penalty parameter or rho(default 10)
-# float → Max ADMM iterations (default 100000)
ELASTIC NET METHOD SPECIFIC OPTIONS
-1 float → regularization parameter corresponding to l1 term or beta_1(default 0.9)
-2 float → regularization parameter corresponding to l2 term or beta_2(default 0.1)
SGD SPECIFIC OPTIONS
-3 float → eta0 learning rate parameter for SGD(default 0.9)
SMO Optimization Options:
-s float → Max SMO
tolerance : Terminate SMO on achieving this tolerance (default 0.15)
-$ [1..] → Max no of SMO iterations (default 100000)
The struct_svm_admm directory contains a sub-folder data in which two sample data files sampledata and OCR200 are given.
You can try running L1 ADMM on this data by typing the following command:
./structsvm_sdm_learn -c 1 -w 2 data/sampledata
You should get a primal objective value ≈ 123.10923143.
The performance and sparsity of the model obtained on the sample train data using L1 ADMM is given in the graphs below:
You can try running Elastic Net ADMM on the sampledata by typing the following command:
./structsvm_sdm_learn -c 1 -w 3 data/sampledata
You should get a primal objective value ≈ 119.94880210.
The performance and sparsity of the model obtained on the sample train data using Elastic Net ADMM is given in the graphs below:
You can try running similar experiments on the OCR200 by typing the following commands:
./structsvm_sdm_learn -c 1 -w 2 data/OCR200
./structsvm_sdm_learn -c 1 -w 3 data/OCR200
For L1 ADMM you should get a primal objective value ≈ 430.93739342.
For Elastic Net ADMM you should get a primal objective value ≈ 409.27678099.
The performance and sparsity of the model obtained on OCR200 using L1 ADMM and Elastic Net ADMM are given in the graphs below:
For sequence labeling application, the input is made of parts x=(x1,x2,…,xT) and the output is also made of corresponding parts y=(y1,y2,…,yT). The feature vector for the input (x,y) given by f(x,y) is constructed using the first-order features and second-order features. The first-order features are obtained by placing the features xt at yt-th position. The second-order features are obtained by considering the label assignment to the label pair yt-yt-1 at every stage t=1,2,…,T.
A). The input file format should be of SVMStruct format.
4 qid:1 5:-1 345:0.3 2101:0.89 6 qid:1 1:-1.5 5:0.2 39:1.98 3 qid:2 567:-0.24 689:0.8 792:0.21
In case your data has the CoNLL text chunking format, please go to Step. B.
If the input file format is different from these formats, then the reader module has to be changed.
All other modules which assume this format should also be changed.
B). CoNLL text chunking data has the following form :
But CC O we PRP B-NP certainly RB O like IN O what WP B-NP we PRP B-NP 've VBP O seen VBN O so RB O far RB O . . O " " O
This data has a column-format (number of columns=3) and a suitable template file is required to extract features from this data.
The template file contains template rules of the following form :
U00:%x[-2,0] U01:%x[-1,0] U02:%x[0,0] U03:%x[1,0] U04:%x[2,0] U05:%x[-1,0]/%x[0,0] U06:%x[0,0]/%x[1,0] U10:%x[-2,1] U11:%x[-1,1] U12:%x[0,1] U13:%x[1,1] U14:%x[2,1] U15:%x[-2,1]/%x[-1,1] U16:%x[-1,1]/%x[0,1] U17:%x[0,1]/%x[1,1] U18:%x[1,1]/%x[2,1] U20:%x[-2,1]/%x[-1,1]/%x[0,1] U21:%x[-1,1]/%x[0,1]/%x[1,1] U22:%x[0,1]/%x[1,1]/%x[2,1]
B
Similar templates are available for other Bio-datasets like BioCreative and BioNLP, the data of which is also in the column format.
For such datasets, please use the utility conll2svmlightconverter by typing the following in the shell prompt:
conll2svmlightconverter template_file train_file test_file validation_file
This will extract the features in SVMStruct format in a new file, the name for which is displayed in the shell prompt.
To know more about this utility, just type conll2svmlightconverter in the shell prompt.
Once you get the features in SVMStruct format, you can start training as described in Step. 8.
The conll2svmlightconverter tool is modelled similar to the feature generation procedure in CRFSGD by Leon Bottou.