Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[Code night] natural language proccessing and machine learning

自然言語と機械学習

  • Be the first to comment

  • Be the first to like this

[Code night] natural language proccessing and machine learning

  1. 1. 2020 11 30 ~ BERT ~
  2. 2. 2 Copyright © 2020 Oracle and/or its affiliates. • • • • BERT( GPU ) •
  3. 3. 3 Copyright © 2020 Oracle and/or its affiliates.
  4. 4. 4 Copyright © 2020 Oracle and/or its affiliates. ( ) ( )
  5. 5. 5 Copyright © 2020 Oracle and/or its affiliates. (MeCab/Juman/Janome/etc.) ) ( ) ( )
  6. 6. 6 Copyright © 2020 Oracle and/or its affiliates. ( ) (GiNZA/CaboCha/KNP/etc.) ) ( ) ( )
  7. 7. 7 Copyright © 2020 Oracle and/or its affiliates.
  8. 8. 8 Copyright © 2020 Oracle and/or its affiliates. • ( ) • ( ) • ( ) A B ) ( ) ( )
  9. 9. 9 Copyright © 2020 Oracle and/or its affiliates. 2 ( ) ( )
  10. 10. 10 Copyright © 2020 Oracle and/or its affiliates. AUC ?%ROC RP R-Squqre RMSE MAE
  11. 11. 11 Copyright © 2020 Oracle and/or its affiliates. • • Wikipedia Yahoo Movie Review Amazon Review Livedoor • Rating AUC ?%ROC RP R-Squqre RMSE MAE
  12. 12. 12 Copyright © 2020 Oracle and/or its affiliates. • • • Mecab Ginza Janome AUC ?%ROC RP R-Squqre RMSE MAE
  13. 13. 13 Copyright © 2020 Oracle and/or its affiliates. • • • Scikit-learn Pytorch TensorFlow Keras Transformers AUC ?%ROC RP R-Squqre RMSE MAE
  14. 14. 14 Copyright © 2020 Oracle and/or its affiliates. [2.531e-02, -5.941e-02, -2.143e-01’][‘ ’,’ ’,’ ’,’ ’,’ ’] ( ) ( ) (TF-IDF ) BERT ( )
  15. 15. 15 Copyright © 2020 Oracle and/or its affiliates. Bidirectional Encoder Represenations from Transformers(BERT) • • TensorFlow PyTorch Transformers ML • 2018 10 NLP SoTA BERT (Transformers) TF-IDF, CNN, BOW, etc NLP
  16. 16. 16 Copyright © 2020 Oracle and/or its affiliates. BERT 2 Next Sentence Prediction(NSP) 2 ( ) Masked Language Model(MLM)
  17. 17. 17 Copyright © 2020 Oracle and/or its affiliates. BERT 2 2 [Mask] [Mask] (Bidirectional) Next Sentence Prediction(NSP) 2 ( ) Masked Language Model(MLM)
  18. 18. 18 Copyright © 2020 Oracle and/or its affiliates. BERT : [Mask] (BERT) [Mask] = [ ] BERT
  19. 19. 19 Copyright © 2020 Oracle and/or its affiliates. BERT
  20. 20. 20 Copyright © 2020 Oracle and/or its affiliates. BERT : • • • • • ( OK) ( ) ( ) ( )
  21. 21. 21 Copyright © 2020 Oracle and/or its affiliates. BERT : ( ) Yahoo Movie Reviews ( ) ( ) ( ) • • (https://github.com/cl- tohoku/bert-Japanese) • BERT (bert-base-japanese-whole-word-masking) • • ( ) • Yahoo Movie Reviews • 10000 • 300 / • 5038 4962 ( ) or ( ) ( )
  22. 22. 22 Copyright © 2020 Oracle and/or its affiliates. BERT : # import from toiro import classifiers from toiro import datadownloader # corpus = 'yahoo_movie_reviews' datadownloader.download_corpus(corpus) # train_df, dev_df, test_df = datadownloader.load_corpus(corpus, n=12500) # model = classifiers.BERTClassificationModel() # model.fit(train_df, dev_df, verbose=True) # text = " " pred_y = model.predict(text) print(pred_y) 1 class BERTClassificationModel: def __init__(self, model_name="cl-tohoku/bert-base-japanese-whole-word-masking", checkpoints_dir=None): …….. …….. …….. …….. BERT
  23. 23. 23 Copyright © 2020 Oracle and/or its affiliates. BERT
  24. 24. 24 Copyright © 2020 Oracle and/or its affiliates. Blog : A practical guide to getting started with Natural Language Processing Blog : A practical guide to getting started with Natural Language Processing • Nvidia GPU A100 • V100 • OCI Compute Service BM.GPU4.8 • CPU 64 • GPU A100 x8 Oracle Cloud
  25. 25. 25 Copyright © 2020 Oracle and/or its affiliates. • • • • • • BERT • - Next Sentence Prediction Masked Language Model • • •
  26. 26. 26 Copyright © 2020 Oracle and/or its affiliates. • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/pdf/1810.04805.pdf • Pretrained Japanese BERT models released / BERT https://www.nlp.ecei.tohoku.ac.jp/news-release/3284/ • BERT Pretrained http://nlp.ist.i.kyoto-u.ac.jp/index.php?ku_bert_japanese • Hugging Face https://github.com/huggingface/transformers • toiro • https://github.com/taishi-i/toiro • A practical guide to getting started with Natural Language Processing https://blogs.oracle.com/cloud-infrastructure/a-practical-guide-to-getting-started-with-natural-language-processing • https://lionbridge.ai/ja/datasets/japanese-language-text-datasets/
  27. 27. 27 Copyright © 2020 Oracle and/or its affiliates. Q : BERT A : Q : Cloud GPU ( GPU GPU CPU ) A : Nvidia V100 Nvidia A100 Q : BERT Q&A A : Q : A : CPU(Xeon 24 ) 2 GPU(V100 x1) 10 Q : notebook A : Github https://github.com/oracle-japan/oci-datascience-nlp- demo01.git Q : A : (1)
  28. 28. 28 Copyright © 2020 Oracle and/or its affiliates. Q : A : BERT BertSum URL BertSum https://arxiv.org/pdf/1903.10318.pdf github : https://github.com/nlpyang/BertSum Q : BERT A : BERT XLNet RoBERTa(BERT ) GPT2 ALBERT (2)

×