Google Earth Engine – Difference Between GEE and Sklearn Random Forest Output

google-earth-enginerandom forest

I am training a random forest in GEE to predict canopy cover. See here for example. Implementation of RF is

var rf_model = ee.Classifier.randomForest(5).train(to_TrainAll, target, bands);

My predicted output mean value is low (expected ~20%, predicted ~8%) so I exported the training data and estimated using the sklearn implementation of RF and this returned a more realistic value. Training data available here.

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor

df = pd.read_csv('/Users/phil/Google_Drive/training_data_100.csv')
bands = ['B5','B7','B10', 'B11_mean','B1_variance3','B2_variance3','B8_variance3',
         'B8A_variance3','B12_variance3','B2_variance5','B8_variance5','B9_variance5',
         'B10_variance5', 'ndvi','ndvi_stdDev_5','ndvi_temporal_variance',
         'slope','aspect','precipitation','tavg_min','tavg_max']

output = []

for _ in range(10):
    df.loc[:, 'train'] = np.random.random(size=len(df)) < .95
    X = df[df.train].cc.values.reshape(-1, 1)
    Y = df[df.train][bands].values
    rf = RandomForestRegressor(5).fit(Y, X)
    output.append(rf.predict(df[~df.train][bands]).mean())

print 'actual: {:.2f} predicted: {:.2f}'.format(df.cc.mean(), np.mean(output))

actual: 20.15 predicted: 20.37

Also if I run

ee.Classifier.cart().train(to_TrainAll, target, bands);

I get a more realistic value. What am I doing wrong?

Best Answer

maybe it helps to specify ''setOutputMode'' to "regression" as done here https://doi.org/10.3390/rs10081167

Related Question