Solved – Data anonymization software

software

Is anyone aware of good data anonymization software? Or perhaps a package for R that does data anonymization? Obviously not expecting uncrackable anonymization – just want to make it difficult.

Best Answer

The Cornell Anonymization Tookit is open source. Their research page has links to associated publications.

Related Solutions

Solved – Examples of wrapping open source machine learning software in PMML

The pmml package for R (used by Rattle, which is mentioned in highBandWidth's answer), provides a fairly transparent look at how to turn a model into PMML output.

In the pmml package reference manual, the example of building a linear model for the iris data set and then producing PMML is given:

> library("pmml")
> (iris.lm <- lm(Sepal.Length ~ ., data=iris))
> pmml(iris.lm)

This will produce the following PMML:

<PMML version="3.2" xmlns="http://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-3_2 http://www.dmg.org/v3-2/pmml-3-2.xsd">
 <Header copyright="Copyright (c) 2011 user" description="Linear Regression Model">
  <Extension name="user" value="user" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.2.27"/>
  <Timestamp>2011-08-27 23:17:42</Timestamp>
 </Header>
 <DataDictionary numberOfFields="5">
  <DataField name="Sepal.Length" optype="continuous" dataType="double"/>
  <DataField name="Sepal.Width" optype="continuous" dataType="double"/>
  <DataField name="Petal.Length" optype="continuous" dataType="double"/>
  <DataField name="Petal.Width" optype="continuous" dataType="double"/>
  <DataField name="Species" optype="categorical" dataType="string">
   <Value value="setosa"/>
   <Value value="versicolor"/>
   <Value value="virginica"/>
  </DataField>
 </DataDictionary>
 <RegressionModel modelName="Linear_Regression_Model" functionName="regression" algorithmName="least squares" targetFieldName="Sepal.Length">
  <MiningSchema>
   <MiningField name="Sepal.Length" usageType="predicted"/>
   <MiningField name="Sepal.Width" usageType="active"/>
   <MiningField name="Petal.Length" usageType="active"/>
   <MiningField name="Petal.Width" usageType="active"/>
   <MiningField name="Species" usageType="active"/>
  </MiningSchema>
  <RegressionTable intercept="2.17126629215507">
   <NumericPredictor name="Sepal.Width" exponent="1" coefficient="0.495888938388551"/>
   <NumericPredictor name="Petal.Length" exponent="1" coefficient="0.829243912234806"/>
   <NumericPredictor name="Petal.Width" exponent="1" coefficient="-0.315155173326474"/>
   <CategoricalPredictor name="Species" value="setosa" coefficient="0"/>
   <CategoricalPredictor name="Species" value="versicolor" coefficient="-0.72356195778073"/>
   <CategoricalPredictor name="Species" value="virginica" coefficient="-1.02349781449083"/>
  </RegressionTable>
 </RegressionModel>
</PMML>

Source Code

The relevant source code for this linear model is in the pmml package pmml.R and pmml.lm.R files. As will be the case for any PMML producer, it basically reads model parameters (here the model is in iris.lm), and then builds up the XML nodes from the model data.

The code in pmml.lm.R is pretty straightforward, and basically node-by-node builds up the PMML.

Below are some of the queries on the data model that are used (indirectly) in pmml.lm.R:

> terms <- attributes(iris.lm$terms)
> terms$dataClasses
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
   "numeric"    "numeric"    "numeric"    "numeric"     "factor" 
> iris.lm$xlevels
$Species
[1] "setosa"     "versicolor" "virginica" 

> iris.lm$coefficients
      (Intercept)       Sepal.Width      Petal.Length       Petal.Width Speciesversicolor  Speciesvirginica 
        2.1712663         0.4958889         0.8292439        -0.3151552        -0.7235620        -1.0234978

Best Answer

Related Solutions

Solved – Examples of wrapping open source machine learning software in PMML

Source Code

Related Question