[poster presentation] reinforcement learning aided ......the institute of electronics, ieice...

一般社団法人電子情報通信学会信学技報 THE INSTITUTE OF ELECTRONICS, IEICE Technical Report INFORMATION AND COMMUNICATION ENGINEERS

This article is a technical report without peer review, and its polished and/or extended version may be published elsewhere.

Copyright ©20●● by IEICE

[Poster Presentation] Reinforcement Learning Aided Orthogonal Frequency Allocation in

LoRaWAN Naoki AIHARA†, Koichi ADACHI†, Osamu TAKYU*, Mai OHTA‡, and Takeo FUJII†

†Advanced Wireless and Communication Research Center, The University of Electro-Communications 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585

* Dept. of Electrical and Computer Engineering, Shinshu University

4-17-1, Wakasato, Nagano 380-8553

‡ School of Electronics and Computer Science, Fukuoka University

8-19-1, Nanakuma, Jonan, Fukuoka 814-0180

E-mail: †{aihara, adachi, fujii}@awcc.uec.ac.jp, *[email protected], ‡[email protected]

Abstract

Recently, massive wireless terminals are deployed to harvest environment information such as temperature, humidity. For

massive connectivity, low power wide area (LPWA) systems, including LoRaWAN, have been developed. However, these

protocols can adopt simple functions due to the limited capability of wireless terminals. For example, distributed asynchronous

medium access control (MAC) protocol (e.g., pure ALOHA, CSMA/CA) is implemented. Packet collision happens due to mutual

interference among wireless terminals. There have been extensive research works on resource allocation to avoid or mitigate

mutual interference. However, the conventional works only consider spreading factor (SF) allocation, which is a modulation

parameter of the physical layer, and a random-hopping is applied so that each node changes frequency channel at the start of

packet transmission. In this research, we propose frequency channel allocation using reinforcement learning. In the proposed

system, a fusion center (FC) calculates Q-rewards for each wireless terminal based on the number of successfully received

packets, which the FC can observe. Moreover, FC allocates different frequency channels to wireless terminals that may collide.

The computer simulation results under the LoRaWAN environment elucidate the effectiveness of the proposed method.

Keywords Frequency Sharing, Machine Learning, Reinforcement Learning, LoRaWAN, Wireless Resource Allocation Ac knowle dgeme nt This research and development work was supported by the MIC/SCOPE #175104004.

一般社団法人電子情報通信学会信学技報 THE INSTITUTE OF ELECTRONICS, IEICE Technical Report INFORMATION AND COMMUNICATION ENGINEERS SR2019-81(2019-11)

Copyright ©2019 by IEICE This article is a technical report without peer review, and its polished and/or extended version may be published elsewhere.

- 45 -

Numerical Simulation• Channel Model: Pathloss + spatially correlated shadowing [4]• Two packet traffics model is assumed [5]: Regular + Event detection• Regular traffic: 2 application types• Packets occur with intervals {0.5,5}[min/packet]

• Event traffic: Event occurs every 5 minutes• Compared method• Conventional: Each node hops frequency

at start of transmission• Upper bound: In case of no mutual interference i.e. K→ ∞

Conclusion• Reinforcement learning based resource allocation✓ Q-reward calculation without additional overhead✓ Proposed method can improve average PDR 13%✓ lower PDR point is also improved

AcknowledgementThis research and development work was supported by the MIC/SCOPE #175104004.

Proposed Method: Resource allocation using DQN[3]• Learning Model• State #: Resource indices of each terminal• Act $: allocated resource index to terminal n at epoch t+1• Reward Qn,kn,t+1: Weighted sum of the number of received

packets• Orthogonal resource allocation in Fusion Center (FC)

1. Observe resource indices of each terminal2. Estimate Q-reward of each resource index of terminal n3. Allocate resources to each terminal based on e-greedy4. Observe the number of received packets during one epoch5. Calculate Q-reward for each terminals from

the number of received packets6. Execute Back-Propagation of FC’s Neural Network (NN)

Reinforcement Learning Aided Orthogonal Frequency Allocation in LoRaWAN

Naoki AIHARA†, Koichi ADACHI†, Osamu TAKYU*, Mai OHTA‡ , and Takeo FUJII†

† Advanced Wireless and Communication Research Center, The University of Electro-Communications* Department of Electrical and Computer Engineering, Shinshu University

‡ School of Electronics and Computer Science, Fukuoka University

Background• Dense distribution of wireless terminals in IoT network• LoRaWAN [1] is one of communication protocols that can accommodate large

number of terminals✓ In LoRaWAN, packet collision and PDR degradation frequently happens

• Related research• Spreading Factor allocation [2]: Optimal modulation parameter allocation can

improve packet delivery performance under massive connectivity• There is no research of frequency channel allocation in LoRaWAN environment→ Effective utilization of frequency resources is necessary to further massive connectivity for LoRaWAN

■References[1] U. Razam, et. al, “Low Power Wide Area Networks: An Overview,” IEEE Commun. Surveys Tuts., vol. 19, pp. 855-873, Jan. 2017[2] J.Lim, Y. Han, “Spreading Factor Allocation for Massive Connectivity in LoRa Systems,” IEEE Commun. Lett. Vol. 22, pp. 800-803, Jan. 2018[3] L. J. Lin, “Self-improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching,” Machine Learning, Vol. 8, pp. 293-321, May. 1992[4] H. Claussen, “Efficient Modelling of Channel Maps with Correlated Shadow Fading in Mobile Radio Systems,” in Proc. IEEE PIMRC, pp. 512-516, Jul. 2005[5] V. Gupta, et. al, “Modelling of IoT Traffic and its impact on LoRaWAN,” in Proc. IEEE Globecom, pp. 1-6, Dec. 2017

ü 13% average PDR improvementü PDR’s 10% point is 40% improved

(a) Learning process (K=4) (b) PDR performance

Communication area 3000×3000[m2] Carrier frequency 923 [MHz]No. of terminals N 3000 Shadowing deviation 3.48 [dB]

Transmit power 13 [dBm] CS threshold -80.0 [dBm]Noise power density -174 [dBm/Hz] No. of resources K {4,8}

Bandwidth 125 [kHz]

Fig. 1. LoRaWAN system model

Fig. 4. Performance

Fig. 2. Proposed Model

Table 1. Wireless parameters

f��

f1f2 f3fK

Node1Node3

Node2Node6

Node4 Node5

CSFail

CSFail

1

3

26CS

Success

5

4

Epoch Length 600 [sec] Number of NN layers 4Number of Epochs 1000 Number of neuron of hidden layer (10,5)

Q-learning rate 0.4 NN learning rate 10-3

Table 2. Learning parameters

⌫ = tanh

✓Dn,t+1

minn02N\n Dn0,t+1

◆

<latexit sha1_base64="HCSOjLoRu0WB2SVTg4kIb/645nE=">AAACzXichZHPaxQxFMdfx1/t+qOrXgQvwaW2YlkytVARhKIePGl/uG2hKUsmZnfCZjJDJrNQx/Eq+A948KQgIv4ZXjx589A/QTy24EXFNz9AtKhvmOS9b97n5SUJEq1SR+nehHfk6LHjJyanWidPnT4z3T57biONMytkT8Q6tlsBT6VWRvacclpuJVbyKNByMxjdLtc3x9KmKjYP3G4idyI+NGqgBHco9dvhFBozGblJWMRdaKPccRMWTMuBm2MDy0V+p5+beXfVL4qcRcpgNEuYMjUguM7vFYQFXIxSzdOQmIKUxGyNMKuGobtC+u0O7dLKyGHHb5wONLYSt98Ag4cQg4AMIpBgwKGvgUOK3zb4QCFBbQdy1Cx6qlqXUEAL2QyzJGZwVEc4DjHablSDcVkzrWiBu2j8LZIEZugn+pbu0w/0Hf1Mv/21Vl7VKHvZxTmoWZn0p59dWP/6XyrC2UH4i/pnzw4GcL3qVWHvSaWUpxA1P370fH/9xtpMfpm+ol+w/5d0j77HE5jxgXi9KtdeQAsfwP/zug87Gwtd/1p3YXWxs3yreYpJuAiXYA7vewmW4S6sQA/3/QgH8B1+ePe9zHvsPalTvYmGOQ+/mff0J3fVsYA=</latexit>

Qn,kn,t+1 = Dn,t+1 + ⌫ ⇥P

n02N\n Dn0,t+1

N � 1<latexit sha1_base64="KaLxP+QGmUss+ZI07kEK1E9T1v0=">AAAC13ichVHNaxQxFH87fnRdP7rqRfASXCpC65KpgiIIRXvwVLqt21Y6dcik2W7YTGZIMgt1GLyJePPkwZOCiPhnePEfEOyfIB4reBHxzeyAH0V9Icnv/d77vbwkUaqkdZTuNbxDh48cnWoeax0/cfLUdPv0mTWbZIaLPk9UYjYiZoWSWvSddEpspEawOFJiPRrdLuPrY2GsTPRdt5uKrZjtaDmQnDmkwrbphbmeG4V6zs36BblJFku/wrMk0BkJnIyFJcHAMJ4HNosxfj8PUoN0EUhNgpi5IWcqXypIEDE+sorZIdHF4q+ZVckiX7rsF2G7Q7u0MnIQ+DXoQG3LSfs1BLANCXDIIAYBGhxiBQwsjk3wgUKK3BbkyBlEsooLKKCF2gyzBGYwZEe47qC3WbMa/bKmrdQcT1E4DSoJzNAP9A3dp+/pW/qJfvtrrbyqUfayi3s00Yo0nH5ybvXrf1Ux7g6GP1X/7NnBAK5XvUrsPa2Y8hZ8oh8/eLa/emNlJr9IX9LP2P8Lukff4Q30+At/1RMrz6GFH+D/+dwHwdp817/Sne9d7Szcqr+iCefhAlzC974GC3AHlqGP536E742pRtO75z30HnmPJ6leo9achd/Me/oD2uK0tA==</latexit>

NNN

NN1

Resourceallocation

StateObservation

Resourceselection

Learningenvironment

Qn,kn,t+1 = Dn,t+1 + ⌫ ⇥P

n02N\n Dn0,t+1

N � 1<latexit sha1_base64="KaLxP+QGmUss+ZI07kEK1E9T1v0=">AAAC13ichVHNaxQxFH87fnRdP7rqRfASXCpC65KpgiIIRXvwVLqt21Y6dcik2W7YTGZIMgt1GLyJePPkwZOCiPhnePEfEOyfIB4reBHxzeyAH0V9Icnv/d77vbwkUaqkdZTuNbxDh48cnWoeax0/cfLUdPv0mTWbZIaLPk9UYjYiZoWSWvSddEpspEawOFJiPRrdLuPrY2GsTPRdt5uKrZjtaDmQnDmkwrbphbmeG4V6zs36BblJFku/wrMk0BkJnIyFJcHAMJ4HNosxfj8PUoN0EUhNgpi5IWcqXypIEDE+sorZIdHF4q+ZVckiX7rsF2G7Q7u0MnIQ+DXoQG3LSfs1BLANCXDIIAYBGhxiBQwsjk3wgUKK3BbkyBlEsooLKKCF2gyzBGYwZEe47qC3WbMa/bKmrdQcT1E4DSoJzNAP9A3dp+/pW/qJfvtrrbyqUfayi3s00Yo0nH5ybvXrf1Ux7g6GP1X/7NnBAK5XvUrsPa2Y8hZ8oh8/eLa/emNlJr9IX9LP2P8Lukff4Q30+At/1RMrz6GFH+D/+dwHwdp817/Sne9d7Szcqr+iCefhAlzC974GC3AHlqGP536E742pRtO75z30HnmPJ6leo9achd/Me/oD2uK0tA==</latexit>

K=4

K=8

Performance is improved with learning proceeds

From interestednode n

From all nodes excluding n

- 46 -

[poster presentation] reinforcement learning aided ......the institute of electronics, ieice...

Documents