In this post I will show how to combine features from natural language processing with traditional features (meta data) in one single model in keras (end-to-end learning). This can be done, by using multiple inputs models.
In applied machine learning data is often more complex than in academia. Scientific data sets are usually limited to one single kind of data, for example text, images or numerical data. This makes a lot of sense, as the goal is to compare new with existing models and approaches. Anyway, solving a real-world problem often combines more then one single data source and therefore we must deal with different kinds of data. To utilize end-to-end learning neural networks instead of manually stacking models, we have to combine these different feature spaces inside the neural network.
Let´s assume we want to solve a text classification problem and we do have additional meta data for each of the documents in our corpus. In simple approaches, where our document is represented by a bag of words, we could just add our metadata to the BoW vector and we are done. But when using word embeddings it´s a bit more complicated.
The easiest solution would be to add our meta data as additional special embeddings. In this case we need to transform our data into categorial features, because our embeddings can exist or not exist. This works if we increase our vocabulary size by the number of additional features and treat them as additional words.
Example: Out dictionary is 100 words and we have 10 additional features. In this case we add 10 additional words to the dictionary. The sequence of embeddings now always start with the meta data features, therefore we must increase our sequence length by 10.
There are several drawbacks with this solution. We only have categorical features, not continuous values and even more important the embeddings must be learned by the model and our vector space mixes up nlp and meta data.
Multiple input models
Much better would be a model, which can handle continuous data and just works as a classifier with nlp features and meta data. This is possible with multiple inputs in keras. Example:
nlp_input = Input(shape=(seq_length,), name='nlp_input')
meta_input = Input(shape=(10,), name='meta_input')
emb = Embedding(output_dim=embedding_size, input_dim=100, input_length=seq_length)(nlp_input)
nlp_out = Bidirectional(LSTM(128, dropout=0.3, recurrent_dropout=0.3, kernel_regularizer=regularizers.l2(0.01)))(emb)
x = concatenate([nlp_out, meta_input])
x = Dense(classifier_neurons, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[nlp_input , meta_input], outputs=[x])
We use a bidirectional LSTM model and combine its output with the metadata. Therefor we define two input layers and treat them in separate models (nlp_input and meta_input). Our NLP data goes through the embedding transformation and the LSTM layer. The meta data is just used as it is, so we can just concatenate it with the lstm output (nlp_out). This combined vector is now classified in a dense layer and finally sigmoid in to the output neuron.