Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.8k views
in Technique[技术] by (71.8m points)

python - Scikit-learn's LabelBinarizer vs. OneHotEncoder

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A simple example which encodes an array using LabelEncoder, OneHotEncoder, LabelBinarizer is shown below.

I see that OneHotEncoder needs data in integer encoded form first to convert into its respective encoding which is not required in the case of LabelBinarizer.

from numpy import array
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelBinarizer

# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 
'warm', 'hot']
values = array(data)
print "Data: ", values
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print "Label Encoder:" ,integer_encoded

# onehot encode
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print "OneHot Encoder:", onehot_encoded

#Binary encode
lb = LabelBinarizer()
print "Label Binarizer:", lb.fit_transform(values)

enter image description here

Another good link which explains the OneHotEncoder is: Explain onehotencoder using python

There may be other valid differences between the two which experts can probably explain.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...