복잡한 JSON 파일 쉽게 처리하기 - 파이썬 (How to flatten complex JSON file in Python)

관리 메뉴

꿈을 향해 on my way

복잡한 JSON 파일 쉽게 처리하기 - 파이썬 (How to flatten complex JSON file in Python) 본문

데이터 사이언스 공부

복잡한 JSON 파일 쉽게 처리하기 - 파이썬 (How to flatten complex JSON file in Python)

박재성 2022. 4. 2. 00:55

문제

: API response 로 json 파일을 받았다. 데이터 실사용자는 세일즈팀과 마케팅팀이었는데 사용자들이 작업하게 편하게 json 파일을 정리 (플래화) 시켜서 csv 파일로 변환해야 했다. 문제는 json 포맷이 매우 복잡했다는 점 - 딕셔너리 안에 리스트로 된 딕셔너리가 있고 정리하기 까다로운 형태였다.

{'contacts': [{'contact_type': 'individual',
   'is_green_match': True,
   'is_signatory': True,
   'match_score': 101,
   'persons': [{'addresses': [{'city': 'NEW YORK',
       'line1': '31 W 34TH ST',
       'postal_code': '10001',
       'state_code': 'NY'}],
     'display': 'Ishay Oved',
     'emails': [],
     'first_name': 'Ishay',
     'id': '8e6d391d-0fa8-5e9d-9880-fd1069c4d190',
     'jobs': [],
     'last_name': 'Oved',
     'phones': [],
     'urls': []}]},
  {'company': {'addresses': [{'city': 'NEW YORK',
      'country_code': 'USA',
      'line1': '1185 6TH AVE FL 10',
      'postal_code': '10036',
      'state_code': 'NY'}],
    'emails': [],
    'id': '2987b8f5-9d94-5858-b065-1fc62b315e80',
    'match_score': 53,
    'name': '31 WEST 34TH STREET LLC',
    'phones': [],
    'urls': []},
   'contact_type': 'company',
   'is_green_match': True,
   'is_signatory': False,
   'match_score': 53,
   'persons': []}],
 'owner_update_time': '2021-08-26',
 'property_id': '2c1820dc-2a57-5532-8022-0a8840e32da7'}

해결법

1) 파이썬 라이브러리 json_flatten 활용

설치

pip install json-flatten

예제

from json_flatten import flatten
unflat_json = {'user' :
               {'Rachel':
                {'UserID':1717171717,
                'Email': 'rachel1999@gmail.com', 
                'friends': ['John', 'Jeremy', 'Emily']
                }
               }
              }
  
flat_json = flatten(unflat_json)

아웃풋

{‘user_Rachel_UserID’: 1717171717, ‘user_Rachel_Email’: ‘rachel1999@gmail.com’, ‘user_Rachel_friends_0’: ‘John’, ‘user_Rachel_friends_1’: ‘Jeremy’, ‘user_Rachel_friends_2’: ‘Emily’}

해결법 2) - pandas json_normalize

설치 - 따로 설치할 필요 없음 (pandas만 있으면 됨)

예제 1)

a_dict = {
    'school': 'ABC primary school',
    'location': 'London',
    'ranking': 2,
}
df = pd.json_normalize(a_dict)

간단한 json 파일은 argument 없이 바람직한 결과를 얻을 수 있다. 매우 직관적이다.

예제 2)

json_list = [
    { 
        'class': 'Year 1', 
        'student count': 20, 
        'room': 'Yellow',
        'info': {
            'teachers': { 
                'math': 'Rick Scott', 
                'physics': 'Elon Mask' 
            }
        },
        'students': [
            { 
                'name': 'Tom', 
                'sex': 'M', 
                'grades': { 'math': 66, 'physics': 77 } 
            },
            { 
                'name': 'James', 
                'sex': 'M', 
                'grades': { 'math': 80, 'physics': 78 } 
            },
        ]
    },
    { 
        'class': 'Year 2', 
        'student count': 25, 
        'room': 'Blue',
        'info': {
            'teachers': { 
                'math': 'Alan Turing', 
                'physics': 'Albert Einstein' 
            }
        },
        'students': [
            { 'name': 'Tony', 'sex': 'M' },
            { 'name': 'Jacqueline', 'sex': 'F' },
        ]
    },
]

pd.json_normalize(json_list)

데이터가 list of dictionary 형태로 있으면 조금 더 복잡해진다. ^ 위에서 처럼, student 안에 데이터는 list 타입이고 그 안에는 또 dictionary 형태로 데이터가 존재한다. 저 안에 있는 데이터를 더 파싱(parsing) 하고 싶으면 'record_path' argument를 쓰면 된다.

예졔 3)

pd.json_normalize(json_list, record_path=['students'])

이렇게 하면 student level로 들어와 데이터를 펼칠 수 있다. 하지만 이전에 있었던 상위레벨의 데이터 (예를 들어 'class', 'room')도 하나의 표에서 보고 싶다면 어떻게 해야될까? 'meta' argument 를 쓰면 된다.

예제 4)

pd.json_normalize(
    json_list, 
    record_path =['students'], 
    meta=['class', 'room', ['info', 'teachers', 'math']]
)

기본적으로 student level 의 데이터가 가장 좌측에 깔리고 추가적으로 'meta' argument에서 선택한 데이터들이 오른쪽에 차례대로 붙는다. meta argument 에서 ['info', 'teachers', 'math'] 가 의아할 수 있다. 결론적으로 말하면 데이터 레벨 때문에 그렇다. 'class' 와 'room' 은 가장 상위레벨에 존재하는 반면 'math'는 'info' -> 'teachers' 레벨 까지 내려와야 하기 때문에 그걸 표시해주기 위함이다.

참고: All Pandas json_normalize() you should know for flattening JSON

저작자표시 (새창열림)

'데이터 사이언스 공부' 카테고리의 다른 글

Git LFS 무료 데이터 초과 문제 공짜로 해결하기 - DVC (Data Version Control) (2)	2022.07.23
Pandas - json_normalize 문제 총 정리 (0)	2022.04.02
Webhook vs API 차이 (0)	2022.01.26
Connect MySQL database with Python (0)	2021.12.22
Using .env Files for Environment Variables in Python Applications (0)	2021.12.22

'데이터 사이언스 공부' Related Articles

Comments

꿈을 향해 on my way

복잡한 JSON 파일 쉽게 처리하기 - 파이썬 (How to flatten complex JSON file in Python) 본문

복잡한 JSON 파일 쉽게 처리하기 - 파이썬 (How to flatten complex JSON file in Python)

'데이터 사이언스 공부' 카테고리의 다른 글

티스토리툴바