deep learning language modeling seminar #natural language processing bert machine learning nlp paper ai2 xlnet transformer-xl multi-class classification llms efficient fine-tuning fast adaptation fine tuning emnlp conference long context encoder decoder model text generation acl alphacode code generation fine-tuning dimensionality language models language ai implicit temporal events temporal dataset temporal reasoning dense retrieval fever hotpotqa multi-hop qa #unified question answering #gpt3 #multi task #zero-shot learning gpt iclr 2020 iclr dataset abductive nlg nli abductive commonsense reasoning reasoning commonsense pretrained model replaced token detection electra pretrained mdoel réformer roberta relative position embeddings position embeddings input representations adaptive softmax transformer face verification face recognition image-to-image transformation gan pytorch classification reverse kl divergence probability parameter regularization parameter of distribution mode collapsing mle maximum entropy distribution map logistic sigmoid kullback-leibler divergence jensen-shannon divergence information theory forward kl divergence exponential family entropy curse of dimensionality cross entropy baysian inference bayes's theorem full connected layer neural network optimization gradient update local gradient learning rate back propagation
See more