More Related Content
Similar to [Code night] natural language proccessing and machine learning
Similar to [Code night] natural language proccessing and machine learning (20)
More from Kenichi Sonoda (11)
[Code night] natural language proccessing and machine learning
- 2. 2 Copyright © 2020 Oracle and/or its affiliates.
•
•
•
• BERT( GPU )
•
- 5. 5 Copyright © 2020 Oracle and/or its affiliates.
(MeCab/Juman/Janome/etc.)
)
( )
( )
- 6. 6 Copyright © 2020 Oracle and/or its affiliates.
( )
(GiNZA/CaboCha/KNP/etc.)
)
( )
( )
- 8. 8 Copyright © 2020 Oracle and/or its affiliates.
• ( )
• ( )
• ( )
A B
)
( )
( )
- 10. 10 Copyright © 2020 Oracle and/or its affiliates.
AUC
?%ROC
RP
R-Squqre
RMSE
MAE
- 11. 11 Copyright © 2020 Oracle and/or its affiliates.
•
• Wikipedia Yahoo Movie Review Amazon Review Livedoor
• Rating
AUC
?%ROC
RP
R-Squqre
RMSE
MAE
- 12. 12 Copyright © 2020 Oracle and/or its affiliates.
•
•
• Mecab Ginza Janome
AUC
?%ROC
RP
R-Squqre
RMSE
MAE
- 13. 13 Copyright © 2020 Oracle and/or its affiliates.
•
•
• Scikit-learn Pytorch TensorFlow Keras Transformers
AUC
?%ROC
RP
R-Squqre
RMSE
MAE
- 14. 14 Copyright © 2020 Oracle and/or its affiliates.
[2.531e-02, -5.941e-02, -2.143e-01’][‘ ’,’ ’,’ ’,’ ’,’ ’]
( )
( )
(TF-IDF )
BERT ( )
- 15. 15 Copyright © 2020 Oracle and/or its affiliates.
Bidirectional Encoder Represenations from Transformers(BERT)
•
• TensorFlow PyTorch Transformers ML
• 2018 10 NLP SoTA
BERT
(Transformers)
TF-IDF, CNN,
BOW, etc
NLP
- 16. 16 Copyright © 2020 Oracle and/or its affiliates.
BERT
2
Next Sentence Prediction(NSP)
2 ( )
Masked Language Model(MLM)
- 17. 17 Copyright © 2020 Oracle and/or its affiliates.
BERT
2
2
[Mask] [Mask]
(Bidirectional)
Next Sentence Prediction(NSP)
2 ( )
Masked Language Model(MLM)
- 18. 18 Copyright © 2020 Oracle and/or its affiliates.
BERT :
[Mask]
(BERT)
[Mask] = [ ]
BERT
- 20. 20 Copyright © 2020 Oracle and/or its affiliates.
BERT :
•
•
•
•
•
( OK)
( )
( )
( )
- 21. 21 Copyright © 2020 Oracle and/or its affiliates.
BERT :
( )
Yahoo Movie Reviews
( )
( )
( )
•
• (https://github.com/cl-
tohoku/bert-Japanese)
• BERT (bert-base-japanese-whole-word-masking)
•
• ( )
• Yahoo Movie Reviews
• 10000
• 300 /
• 5038 4962
( )
or
( )
( )
- 22. 22 Copyright © 2020 Oracle and/or its affiliates.
BERT :
# import
from toiro import classifiers
from toiro import datadownloader
#
corpus = 'yahoo_movie_reviews'
datadownloader.download_corpus(corpus)
#
train_df, dev_df, test_df = datadownloader.load_corpus(corpus, n=12500)
#
model = classifiers.BERTClassificationModel()
#
model.fit(train_df, dev_df, verbose=True)
#
text = " "
pred_y = model.predict(text)
print(pred_y)
1
class BERTClassificationModel:
def __init__(self,
model_name="cl-tohoku/bert-base-japanese-whole-word-masking",
checkpoints_dir=None):
……..
……..
……..
……..
BERT
- 24. 24 Copyright © 2020 Oracle and/or its affiliates.
Blog : A practical guide to getting started with Natural Language
Processing
Blog : A practical guide to getting started with Natural Language Processing
• Nvidia GPU A100
• V100
• OCI Compute Service BM.GPU4.8
• CPU 64
• GPU A100 x8
Oracle Cloud
- 25. 25 Copyright © 2020 Oracle and/or its affiliates.
•
•
•
•
•
• BERT
•
- Next Sentence Prediction Masked Language Model
•
•
•
- 26. 26 Copyright © 2020 Oracle and/or its affiliates.
• BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
https://arxiv.org/pdf/1810.04805.pdf
• Pretrained Japanese BERT models released / BERT
https://www.nlp.ecei.tohoku.ac.jp/news-release/3284/
• BERT Pretrained
http://nlp.ist.i.kyoto-u.ac.jp/index.php?ku_bert_japanese
• Hugging Face
https://github.com/huggingface/transformers
• toiro
• https://github.com/taishi-i/toiro
• A practical guide to getting started with Natural Language Processing
https://blogs.oracle.com/cloud-infrastructure/a-practical-guide-to-getting-started-with-natural-language-processing
•
https://lionbridge.ai/ja/datasets/japanese-language-text-datasets/
- 27. 27 Copyright © 2020 Oracle and/or its affiliates.
Q : BERT
A :
Q : Cloud GPU ( GPU
GPU CPU )
A :
Nvidia V100 Nvidia
A100
Q : BERT
Q&A
A :
Q :
A : CPU(Xeon 24 ) 2
GPU(V100 x1) 10
Q : notebook
A : Github
https://github.com/oracle-japan/oci-datascience-nlp-
demo01.git
Q :
A :
(1)
- 28. 28 Copyright © 2020 Oracle and/or its affiliates.
Q :
A : BERT BertSum
URL BertSum
https://arxiv.org/pdf/1903.10318.pdf
github : https://github.com/nlpyang/BertSum
Q : BERT
A : BERT XLNet RoBERTa(BERT ) GPT2
ALBERT
(2)