3. )(
3
C Te TC a
C RTs Ci Ci C C
ü t t p s a s g C
• (/ 2) / H N Cs L
• s C C N
• Nv
• ( - N
• . N
• coRh C L (/ 2) /
• D V s m LirgS nd I GN C
dpa V E
A :
8. 8
qa_id
question_title
ü tq R a
ü th R U
question_body ü th
question_user_name ü th m
question_user_page ü th o
answer ü th a ex
answer_user_name ü ex m
answer_user_page ü ex o
url ü tq
category ü tq
host ü tq i Um
LI cp
1, 2, 3 D
What am I losing when using extension …
After playing around with macro …
ysap
https://photo.stackexchange.com/users/1024
I just got extension tubes, so here's the skinny. …
rfusca
https://photo.stackexchange.com/users/1917
1 1 . 1/ 1 :
LIFE_ARTS
photo.stackexchange.com
train data 6079n public data g476n (13%)
private data g3186n (87%)
ku L s
12. 12
• A : :BD GC FK
• K B K :F D:K : G P B D F@ A P E:P K D F@ A
• https://arxiv.org/abs/1905.05583
• + :@BF@ ( :K MF :K EG D
• 4GK G KKBF@) 0B : @ BK B M BGF B A :BF : : !: : G : @ GDMEFK
• . :BD BK BF A GDDG BF@ :@
• -GF : F: GGD GM M K M F GM M GE D: G
1DG :D+ :@ 4GGDBF@ .
• & GD B A MD BD: D : B B 0GD ! A:FCK : A: A:
• GF@ A : G KGDG : B B : BGF
https://www.kaggle.com/c/google-quest-challenge/discussion/129885
13. + +
13
L J E7 C
)
5403 L J E79
()-
5/24 5403 EJ
()-
5403
C (
L J E79
LCC
EJ
LCC
L J E7 C
LCC
fq]pra . _lk s b J LE_1 J J CL E ajh oeS
iJ : E_c Pgm E C]d C . ) L J E. ()- EJ . ()- n [
. J. C : : J LE E E 9 J E : :C JJ : E J:LJJ E -,*,
14. 14
-. N T X aN -.N1 :26:4 E - 1 0 / Rb
NN 6 0 L
BBB 62 0 : 90 7 :1 7 1
15. 15
0 R T G P 3 6
0
6
B B
A6 6 46
B B
0
6
B B
A6 6 46
B B
0
6
B B
A6 6 46
B B
6 1 B B
A6 6 461 B B AB &
E A
6 1 B B
A6 6 461 B B AB (
E A
2 A6 6 461 B B
32 )D6 2 6.
B
6 A6
32 )D6 2 6.
B
6 A6
32 )D6 2 6.
B
6 A6
0
6
B B
A6 6 46
B B
32 )D6 2 6.
B
6 A6
6 1 B B
A6 6 461 B B AB (
E A E B B B
16. 16
def rank_average(preds):
ranked_pred = rankdata(preds)
return (ranked_pred - np.min(ranked_pred)) / (np.max(ranked_pred) - np.min(ranked_pred))
class OptimPreds(object):
def __init__(self, df_train):
self.score_range_dict = {}
for i, c in enumerate(df_train.columns[11:]):
cnt = df_train[c].value_counts(normalize=True).sort_index()
self.score_range_dict[i] = [cnt.index.values.tolist(), cnt.values.tolist()]
def predict(self, preds, i):
return pd.cut(rank_average(preds),
[-np.inf] + np.cumsum(self.score_range_dict[i][1])[:-1].tolist() +
[np.inf], labels = self.score_range_dict[i][0])
def optim_predict(pred):
for i in range(pred.shape[1]):
if i in [2,5,12,13,14,15,19]:
pred[:,i] = optim.predict(pred[:,i], i)
return pred
optim = OptimPreds(df_train)
valid_pred = optim_predict(valid_pred_org.copy())
V train targetV C>
V
01. - + ( ( )+
896 2:5 - - )
8 3764 -( () ) ) (
19. Didn’t work for me
19
ü Pre-training with stackoverflow data (150,000 sentences)
ü Multi sample dropout
ü The other models
ü Roberta
ü Albert
ü XLNet
ü Concatenate question only output & answer only model
ü Concatenate category MLP with BERT model
ü LSTM head instead of Dense with BERT model
ü Freeze half of BertLayer for reducing model complexity
ü Skip half of BertLayer for reducing model complexity
ü USE(Universal Sequence Encorder) + MLP
ü LSTM model with gensim embedding
ü custom loss
ü BCE & MSE
ü focal loss
ü Word count feature
ü Concat title and question_body as a one block (removing ["SEP"] between them)
ü Up-sampling for imbalance target column
https://www.kaggle.com/c/google-quest-challenge/discussion/129885
B
B
L B 1