In this post I will show how to combine features from natural language processing with traditional features (meta data) in one single model in keras. This can be done, by using multiple inputs and is not just usable in this specific context.
Recently I showed an example, where chats were classified in two categories, using traditional techniques (stacked ensemble). In another post I showed how to improve this approach by using deep neural networks, but only used the text data. It seems useful to add the metadata to the keras model and combine nlp and non-nlp features.
The easiest solution would be to add our meta data as additional special embeddings. In this case we need to transform our data into categorial features, because our embeddings can exist or not exist. This works if we increase our vocabulary size by the number of additional features and treat them as additional words.
Example: Out dictionary is 100 words and we have 10 additional features. In this case we add 10 additional words to the dictionary. The sequence of embeddings now always start with the meta data features, therefore we must increase our sequence length by 10.
There are several drawbacks with this solution. We only have categorical features, not continuous values and even more important the embeddings must be learned by the model and our vector space mixes up nlp and meta data.
Multiple input models
Much better would be a model, which can handle continuous data and just works as a classifier with nlp features and meta data. This is possible with multiple inputs in keras. Example:
nlp_input = Input(shape=(seq_length,), name='nlp_input')
meta_input = Input(shape=(10,), name='meta_input')
emb = Embedding(output_dim=embedding_size, input_dim=100, input_length=seq_length)(nlp_input)
nlp_out = Bidirectional(LSTM(128, dropout=0.3, recurrent_dropout=0.3, kernel_regularizer=regularizers.l2(0.01)))(emb)
x = concatenate([nlp_out, meta_input])
x = Dense(classifier_neurons, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[nlp_input , meta_input], outputs=[x])
We use a bidirectional LSTM model and combine its output with the metadata. Therefor we define two input layers and treat them in separate models (nlp_input and meta_input). Our NLP data goes through the embedding transformation and the LSTM layer. The meta data is just used as it is, so we can just concatenate it with the lstm output (nlp_out). This combined vector is now classified in a dense layer and finally sigmoid in to the output neuron.