'스타2'에 해당되는 글 1건

  1. 2020.04.06 :: [데이콘] 스타2 게임 데이터 분석대회
Data 분석 2020. 4. 6. 21:49

 

https://dacon.io/competitions/official/235583/overview/

 

[게임] 월간 데이콘 3 행동 데이터 분석 대회

출처 : DACON - Data Science Competition

dacon.io

3월 한달간 진행된 데이콘 행동 데이터 분석 대회.

틈틈히 분석해 보았지만 아무래도 시간이 부족해서 간신히 베이스라인을 넘어서는 결과만 가져왔다.

그리고 이걸 분석하면서 메모리가 여러번 터짐 -_- train data가 5기가 정도 되는데 램이 4기가라 분석 중 고생을 많이했다. PC를 바꿀때가 된 것 같다는 생각을 하게 된 분석 대회.

해당 분석대회 토론방을 가보면 대용량 데이터를 핸들링하는 방법을 공유하고 있어서 많은 도움이 되었다.

간신히(?) 베이스 라인을 넘긴 score로 종료.

 

아래는 전처리 코드. 

1. data read

2. game_id별로 그룹화

3. event에 있는 string data를 parsing해서 각 단어(scv, build 이런단어들)별로 column을 만드는 작업

끝나고 나니 아쉬운건 time data도 충분히 사용 여지가 있었을 것 같고, camera 좌표를 이용한 starting position을 이용하는 방법도 있었는데 해당 부분까지 적용하지 못했던건 좀 아쉽다.

import pandas as pd
import numpy as np
import string
import re

def remove_punct(text):
    table=str.maketrans('','',string.punctuation)
    #return text.translate(table)
    #return re.sub("\d+", " ", text.translate(table)).replace('Location','').replace('NW',' ')
    return text.translate(table).replace('Location','')

def remove_include_number_string(target):
    return [s for s in remove_punct(str(target)).split(' ') if not re.search(r'\d',s)]

def create_events_dict(event_contents):
    total_strings = []
    for eg in event_contents:    
        for ts in remove_include_number_string(remove_punct(str(eg))):
            if ts != '' and ts not in ['None','at','nan']:
                total_strings.append(ts)
    total_st_dict = {}
    for d in range(len(np.unique(total_strings, return_counts=True)[0])):
        total_st_dict[np.unique(total_strings, return_counts=True)[0][d]] = np.unique(total_strings, return_counts=True)[1][d]
    return total_st_dict

train_sample = pd.read_csv('train.csv')
train_sample = train_origin.drop(['time'],axis=1)
train_sample[['winner','player','species','event']] = train_sample[['winner','player','species','event']].astype('category')
match_group = train_sample.groupby('game_id')
match_groups = [g for g in match_group]

df = pd.DataFrame()
for group in match_groups:
    cg = group[1]
    player_groups = [g for g in cg.groupby('player')]
    
    match_info = pd.DataFrame()
    for player_g in player_groups:
        player_info = player_g[1][['game_id','winner','player','species']].drop_duplicates()
        
        if player_info['species'].values == 'T':
            player_info['species'] = 1
        elif player_info['species'].values == 'P':
            player_info['species'] = 2
        else:
            player_info['species'] = 3
        
        player_info = player_info.rename(columns={'species':'species_'+str(player_g[0])})
        
        player_info = player_info.drop('player',axis=1)
        event_value_counts = player_g[1]['event'].value_counts()
        event_value_counts_df = pd.DataFrame(event_value_counts).T
        
        for col in event_value_counts_df.columns:
            player_info[col+'_'+str(player_g[0])] = event_value_counts_df[col].values[0]        
            #print(player_info)
        
        if 'game_id' in match_info.columns:
            match_info = pd.merge(match_info,player_info,on=['game_id','winner'],sort=True)
        else:
            match_info = player_info
    df = df.append(match_info)

total_st_dict_keys = {}
i = 0
df_events = pd.DataFrame()

for match in match_groups:
    event_group = match[1].groupby('event')
    event_groups = [g for g in event_group]
    event = event_groups[0]

    total_events_dict_0 = create_events_dict(event[1][event[1]['player']== 0]['event_contents'])
    total_events_dict_1 = create_events_dict(event[1][event[1]['player']== 1]['event_contents'])    
    
    keys = list(total_events_dict_0.keys()) 
    keys_1 = [key for key in keys if key not in list(total_events_dict_1.keys())]
    keys.extend(keys_1)
    #keys = keys_0.extend(keys_1)
    
    for key in keys:
        if key not in total_st_dict_keys.keys():
            total_st_dict_keys[key+'_0'] = 0 
            total_st_dict_keys[key+'_1'] = 0

    new_key_dicts_0 = {}
    for key in total_events_dict_0.keys():
        new_key_dicts_0[key+'_0'] = total_events_dict_0[key]
            
    new_key_dicts_1 = {}
    for key in total_events_dict_1.keys():
        new_key_dicts_1[key+'_1'] = total_events_dict_1[key]
        
    total_st_dict_keys =  { x:0 for x in total_st_dict_keys}
            
    for key in new_key_dicts_0.keys():
        if key in total_st_dict_keys.keys():
            total_st_dict_keys[key] = new_key_dicts_0[key]
            
    for key in new_key_dicts_1.keys():
        if key in total_st_dict_keys.keys():
            total_st_dict_keys[key] = new_key_dicts_1[key]
    
    df_events = df_events.append(pd.DataFrame(total_st_dict_keys, index=[0]))
posted by 초코렛과자
: