Environment Quick Summary#

_images/render.gif

This environment is a Gymnasium environment designed for trading on a single pair.

Action Space

Discrete(number_of_positions)

Observation Space

Box(-np.inf, +np.inf, shape=...)

Import

gymnasium.make("TradingEnv", df=df)

Important Parameters#

  • df (required): A pandas.DataFrame with a close and DatetimeIndex as index. To perform a render, your DataFrame also needs to contain open, low, and high.

  • positions (optional, default: [-1, 0, 1]): The list of positions that your agent can take. Each position is represented by a number (as described in the Action Space section).

Documentation of all the parameters

Action Space#

The action space is a list of positions given by the user. Every position is labeled from -inf to +inf and corresponds to the ratio of the portfolio valuation engaged in the position ( > 0 to bet on the rise, < 0 to bet on the decrease).

Example with BTC/USDT pair (%pv means “Percent of the Portfolio Valuation”)#

Position examples

BTC (%pv)

USDT (%pv)

Borrowed BTC (%pv)

Borrowed USDT (%pv)

0

100

1

100

0.5

50

50

2

200

100

-1

200

100

If position < 0: the environment performs a SHORT (by borrowing USDT and buying BTC with it).

If position > 1: the environment uses MARGIN trading (by borrowing BTC and selling it to get USDT).

Observation Space#

The observation space is an np.array containing:

  • The row of your DataFrame columns containing features in their name, at a given step : the static features

  • The dynamic features (by default, the last position taken by the agent, and the current real position).

>>> df["feature_pct_change"] = df["close"].pct_change()
>>> df["feature_high"] = df["high"] / df["close"] - 1
>>> df["feature_low"] = df["low"] / df["close"] - 1
>>> df.dropna(inplace= True)
>>> env = gymnasium.make("TradingEnv", df = df, positions = [-1, 0, 1], initial_position= 1)
>>> observation, info = env.reset()
>>> observation
array([-2.2766300e-04,  1.0030895e+00,  9.9795288e-01,  1.0000000e+00], dtype=float32)

If the windows parameter is set to an integer W > 1, the observation is a stack of the last W states.

>>> env = gymnasium.make("TradingEnv", df = df, positions = [-1, 0, 1], initial_position= 1, windows = 3)
>>> observation, info = env.reset()
>>> observation
array([[-0.00231082,  1.0052915 ,  0.9991996 ,  1.        ],
       [ 0.01005705,  1.0078559 ,  0.98854125,  1.        ],
       [-0.00408145,  1.0069852 ,  0.99777853,  1.        ]],
       dtype=float32)

Reward#

The reward is given by the formula \(r_{t} = ln(\frac{p_{t}}{p_{t-1}})\text{ with }p_{t}\text{ = portofolio valuation at timestep }t\) . It is highly recommended to customize the reward function to your needs.

Starting State#

The environment explores the given DataFrame and starts at its beginning.

Episode Termination#

The episode finishes if:

1 - The environment reaches the end of the DataFrame, truncated is returned as True 2 - The portfolio valuation reaches 0 (or bellow). done is returned as True. It can happen when taking margin positions (>1 or <0).

Arguments#

class gym_trading_env.environments.TradingEnv(df: ~pandas.core.frame.DataFrame, positions: list = [0, 1], dynamic_feature_functions=[<function dynamic_feature_last_position_taken>, <function dynamic_feature_real_position>], reward_function=<function basic_reward_function>, windows=None, trading_fees=0, borrow_interest_rate=0, portfolio_initial_value=1000, initial_position='random', max_episode_duration='max', verbose=1, name='Stock', render_mode='logs')#

An easy trading environment for OpenAI gym. It is recommended to use it this way :

import gymnasium as gym
import gym_trading_env
env = gym.make('TradingEnv', ...)
Parameters
  • df (pandas.DataFrame) – The market DataFrame. It must contain ‘open’, ‘high’, ‘low’, ‘close’. Index must be DatetimeIndex. Your desired inputs need to contain ‘feature’ in their column name : this way, they will be returned as observation at each step.

  • positions (optional - list[int or float]) – List of the positions allowed by the environment.

  • dynamic_feature_functions (optional - list) –

    The list of the dynamic features functions. By default, two dynamic features are added :

    • the last position taken by the agent.

    • the real position of the portfolio (that varies according to the price fluctuations)

  • reward_function (optional - function<History->float>) – Take the History object of the environment and must return a float.

  • windows (optional - None or int) – Default is None. If it is set to an int: N, every step observation will return the past N observations. It is recommended for Recurrent Neural Network based Agents.

  • trading_fees (optional - float) – Transaction trading fees (buy and sell operations). eg: 0.01 corresponds to 1% fees

  • borrow_interest_rate (optional - float) – Borrow interest rate per step (only when position < 0 or position > 1). eg: 0.01 corresponds to 1% borrow interest rate per STEP ; if your know that your borrow interest rate is 0.05% per day and that your timestep is 1 hour, you need to divide it by 24 -> 0.05/100/24.

  • portfolio_initial_value (float or int) – Initial valuation of the portfolio.

  • initial_position (optional - float or int) – You can specify the initial position of the environment or set it to ‘random’. It must contained in the list parameter ‘positions’.

  • max_episode_duration (optional - int or 'max') – If a integer value is used, each episode will be truncated after reaching the desired max duration in steps (by returning truncated as True). When using a max duration, each episode will start at a random starting point.

  • verbose (optional - int) – If 0, no log is outputted. If 1, the env send episode result logs.

  • name (optional - str) – The name of the environment (eg. ‘BTC/USDT’)