{ "cells": [ { "cell_type": "markdown", "id": "723c394d-987d-4f2d-8bd6-378a23b1218e", "metadata": { "tags": [] }, "source": [ "# 3. Process Brown Dwarf Atmospheric Parameters \n", "\n", "In the following steps, you will:\n", "\n", "- Load the brown dwarf dataset used to train the ML models.\n", "- Prepare the X and y variables to deploy the trained ML models.\n", "- Visualize them for a few cases.\n", "\n", "We will need the following modules from `TelescopeML`:\n", "\n", "- **DataMaster**: to prepare the synthetic brown dwarf dataset and load the trained machine learning (ML) models.\n", "- **StatVisAnalyzer**: to provide statistical tests and plotting functions.\n", "- **IO_utils**: to provide functions to load the trained ML models.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "dab6ba1b-87cf-482b-9bad-7ae019708a27", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No Bottleneck unit testing available.\n" ] }, { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!Object.prototype.hasOwnProperty.call(output.data, EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(\"1001\");\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error(url) {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (let i = 0; i < css_urls.length; i++) {\n", " const url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\": \"XypntL49z55iwGVUW4qsEu83zKL3XEcz0MjuGOQ9SlaaQ68X/g+k1FcioZi7oQAc\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\": \"bEsM86IHGDTLCS0Zod8a8WM6Y4+lafAL/eSiyQcuPzinmWNgNO2/olUF0Z2Dkn5i\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\": \"TX0gSQTdXTTeScqxj6PVQxTiRW8DOoGVwinyi1D3kxv7wuxQ02XkOxv0xwiypcAH\"};\n", "\n", " for (let i = 0; i < js_urls.length; i++) {\n", " const url = js_urls[i];\n", " const element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.async = false;\n", " element.src = url;\n", " if (url in hashes) {\n", " element.crossOrigin = \"anonymous\";\n", " element.integrity = \"sha384-\" + hashes[url];\n", " }\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " if (force === true) {\n", " display_loaded();\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(\"1001\")).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"1001\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error(url) {\n console.error(\"failed to load \" + url);\n }\n\n for (let i = 0; i < css_urls.length; i++) {\n const url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\": \"XypntL49z55iwGVUW4qsEu83zKL3XEcz0MjuGOQ9SlaaQ68X/g+k1FcioZi7oQAc\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\": \"bEsM86IHGDTLCS0Zod8a8WM6Y4+lafAL/eSiyQcuPzinmWNgNO2/olUF0Z2Dkn5i\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\": \"TX0gSQTdXTTeScqxj6PVQxTiRW8DOoGVwinyi1D3kxv7wuxQ02XkOxv0xwiypcAH\"};\n\n for (let i = 0; i < js_urls.length; i++) {\n const url = js_urls[i];\n const element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.async = false;\n element.src = url;\n if (url in hashes) {\n element.crossOrigin = \"anonymous\";\n element.integrity = \"sha384-\" + hashes[url];\n }\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"1001\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!Object.prototype.hasOwnProperty.call(output.data, EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(\"1003\");\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error(url) {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (let i = 0; i < css_urls.length; i++) {\n", " const url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\": \"XypntL49z55iwGVUW4qsEu83zKL3XEcz0MjuGOQ9SlaaQ68X/g+k1FcioZi7oQAc\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\": \"bEsM86IHGDTLCS0Zod8a8WM6Y4+lafAL/eSiyQcuPzinmWNgNO2/olUF0Z2Dkn5i\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\": \"TX0gSQTdXTTeScqxj6PVQxTiRW8DOoGVwinyi1D3kxv7wuxQ02XkOxv0xwiypcAH\"};\n", "\n", " for (let i = 0; i < js_urls.length; i++) {\n", " const url = js_urls[i];\n", " const element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.async = false;\n", " element.src = url;\n", " if (url in hashes) {\n", " element.crossOrigin = \"anonymous\";\n", " element.integrity = \"sha384-\" + hashes[url];\n", " }\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " if (force === true) {\n", " display_loaded();\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(\"1003\")).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"1003\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error(url) {\n console.error(\"failed to load \" + url);\n }\n\n for (let i = 0; i < css_urls.length; i++) {\n const url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\": \"XypntL49z55iwGVUW4qsEu83zKL3XEcz0MjuGOQ9SlaaQ68X/g+k1FcioZi7oQAc\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\": \"bEsM86IHGDTLCS0Zod8a8WM6Y4+lafAL/eSiyQcuPzinmWNgNO2/olUF0Z2Dkn5i\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\": \"TX0gSQTdXTTeScqxj6PVQxTiRW8DOoGVwinyi1D3kxv7wuxQ02XkOxv0xwiypcAH\"};\n\n for (let i = 0; i < js_urls.length; i++) {\n const url = js_urls[i];\n const element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.async = false;\n element.src = url;\n if (url in hashes) {\n element.crossOrigin = \"anonymous\";\n element.integrity = \"sha384-\" + hashes[url];\n }\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"1003\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from TelescopeML.DataMaster import *\n", "from TelescopeML.Predictor import *\n", "from TelescopeML.StatVisAnalyzer import *" ] }, { "cell_type": "markdown", "id": "2052a7c9-c4c0-44fb-8456-b3b785ea70c4", "metadata": {}, "source": [ "\"ML\n" ] }, { "cell_type": "markdown", "id": "98309fb6-0ed0-4f20-a65e-45b54910a8c8", "metadata": { "tags": [] }, "source": [ "## 2.1 Load the Synthetic spectra - training dataset \n", "\n", "We computed a low-resolution spectrum ($R$=200) utilizing atmopshric brown dwarfs grid model, [*Sonora-Bobcat*](https://arxiv.org/pdf/2107.07434.pdf) for spectral range $\\sim$0.9-2.4 $\\mu m$. An open-source atmospheric radiative transfer Python package, [*PICASO*](https://natashabatalha.github.io/picaso/) was employed for generating these datasets. This dataset encompass 30,888 synthetic spectra (or instances or rows). \n", "\n", "\n", "Each spectrum has 104 wavelengths (i.e., 0.897, 0.906, ..., 2.512 μm) and 4 output atmospheric parameters:\n", "\n", "- gravity (log *g*)\n", "- temperature (*T*eff)\n", "- carbon-to-oxygen ratio (C/O)\n", "- metallicity ([M/H])" ] }, { "cell_type": "code", "execution_count": 2, "id": "94ab3753-16a7-40be-8832-06a8837649db", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/Users/egharibn/RESEARCH/ml/projects/TelescopeML_project/reference_data/'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os \n", "\n", "__reference_data_path__ = os.getenv(\"TelescopeML_reference_data\")\n", "__reference_data_path__ \n", "\n", "\n", "# Note: insert the directory of the reference_data if you get an error reading the reference data!!!\n", "# __reference_data_path__ = 'INSERT_DIRECTORY_OF_reference_data'\n" ] }, { "cell_type": "markdown", "id": "440fbe88-e7b4-496f-81ab-7c7fa9db75d3", "metadata": {}, "source": [ " Load the dataset and check few instances " ] }, { "cell_type": "code", "execution_count": 3, "id": "5332f129-5253-4067-ba57-165f5249e252", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gravitytemperaturec_o_ratiometallicity2.5122.4872.4622.4382.4132.389...0.9810.9710.9620.9520.9430.9330.9240.9150.9060.897
05.011000.25-1.09.103045e-081.181658e-071.307868e-071.269229e-071.159179e-078.925110e-08...1.257751e-079.640859e-087.612550e-086.901364e-086.247359e-084.112384e-085.127995e-084.897355e-084.087795e-082.791689e-08
15.011000.25-0.79.103045e-081.181658e-071.307868e-071.269229e-071.159179e-078.925110e-08...1.257751e-079.640859e-087.612550e-086.901364e-086.247359e-084.112384e-085.127995e-084.897355e-084.087795e-082.791689e-08
25.011000.25-0.59.103045e-081.181658e-071.307868e-071.269229e-071.159179e-078.925110e-08...1.257751e-079.640859e-087.612550e-086.901364e-086.247359e-084.112384e-085.127995e-084.897355e-084.087795e-082.791689e-08
35.011000.25-0.39.103045e-081.181658e-071.307868e-071.269229e-071.159179e-078.925110e-08...1.257751e-079.640859e-087.612550e-086.901364e-086.247359e-084.112384e-085.127995e-084.897355e-084.087795e-082.791689e-08
45.011000.250.09.103045e-081.181658e-071.307868e-071.269229e-071.159179e-078.925110e-08...1.257751e-079.640859e-087.612550e-086.901364e-086.247359e-084.112384e-085.127995e-084.897355e-084.087795e-082.791689e-08
\n", "

5 rows × 108 columns

\n", "
" ], "text/plain": [ " gravity temperature c_o_ratio metallicity 2.512 2.487 \\\n", "0 5.0 1100 0.25 -1.0 9.103045e-08 1.181658e-07 \n", "1 5.0 1100 0.25 -0.7 9.103045e-08 1.181658e-07 \n", "2 5.0 1100 0.25 -0.5 9.103045e-08 1.181658e-07 \n", "3 5.0 1100 0.25 -0.3 9.103045e-08 1.181658e-07 \n", "4 5.0 1100 0.25 0.0 9.103045e-08 1.181658e-07 \n", "\n", " 2.462 2.438 2.413 2.389 ... 0.981 \\\n", "0 1.307868e-07 1.269229e-07 1.159179e-07 8.925110e-08 ... 1.257751e-07 \n", "1 1.307868e-07 1.269229e-07 1.159179e-07 8.925110e-08 ... 1.257751e-07 \n", "2 1.307868e-07 1.269229e-07 1.159179e-07 8.925110e-08 ... 1.257751e-07 \n", "3 1.307868e-07 1.269229e-07 1.159179e-07 8.925110e-08 ... 1.257751e-07 \n", "4 1.307868e-07 1.269229e-07 1.159179e-07 8.925110e-08 ... 1.257751e-07 \n", "\n", " 0.971 0.962 0.952 0.943 0.933 \\\n", "0 9.640859e-08 7.612550e-08 6.901364e-08 6.247359e-08 4.112384e-08 \n", "1 9.640859e-08 7.612550e-08 6.901364e-08 6.247359e-08 4.112384e-08 \n", "2 9.640859e-08 7.612550e-08 6.901364e-08 6.247359e-08 4.112384e-08 \n", "3 9.640859e-08 7.612550e-08 6.901364e-08 6.247359e-08 4.112384e-08 \n", "4 9.640859e-08 7.612550e-08 6.901364e-08 6.247359e-08 4.112384e-08 \n", "\n", " 0.924 0.915 0.906 0.897 \n", "0 5.127995e-08 4.897355e-08 4.087795e-08 2.791689e-08 \n", "1 5.127995e-08 4.897355e-08 4.087795e-08 2.791689e-08 \n", "2 5.127995e-08 4.897355e-08 4.087795e-08 2.791689e-08 \n", "3 5.127995e-08 4.897355e-08 4.087795e-08 2.791689e-08 \n", "4 5.127995e-08 4.897355e-08 4.087795e-08 2.791689e-08 \n", "\n", "[5 rows x 108 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_BD = pd.read_csv(os.path.join(__reference_data_path__, \n", " 'training_datasets', \n", " 'browndwarf_R100_v4_newWL_v3.csv.bz2'), compression='bz2')\n", "train_BD.head(5)" ] }, { "cell_type": "markdown", "id": "e1568f0a-dff8-41ad-a418-a5b8dcb205b5", "metadata": {}, "source": [ "### 2.1.2 Check atmospheric parameters\n", "\n", "\n", "- gravity (log *g*)\n", "- temperature (*T*eff)\n", "- carbon-to-oxygen ratio (C/O)\n", "- metallicity ([M/H])" ] }, { "cell_type": "code", "execution_count": 4, "id": "9a30da21-ab88-406e-ad1f-e54c6291684c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gravitytemperaturec_o_ratiometallicity
05.011000.25-1.0
15.011000.25-0.7
25.011000.25-0.5
35.011000.25-0.3
45.011000.250.0
\n", "
" ], "text/plain": [ " gravity temperature c_o_ratio metallicity\n", "0 5.0 1100 0.25 -1.0\n", "1 5.0 1100 0.25 -0.7\n", "2 5.0 1100 0.25 -0.5\n", "3 5.0 1100 0.25 -0.3\n", "4 5.0 1100 0.25 0.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "output_names = ['gravity', 'temperature', 'c_o_ratio', 'metallicity']\n", "train_BD[output_names].head()" ] }, { "cell_type": "code", "execution_count": 5, "id": "1591d643-1052-47ab-af83-737069a88edd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['2.512', '2.487', '2.462', '2.438', '2.413']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# training_features_labels: they are Wavelengths variables in string format\n", "\n", "wavelength_names = [item for item in train_BD.columns.to_list() if item not in output_names]\n", "wavelength_names[:5]" ] }, { "cell_type": "code", "execution_count": 6, "id": "6aab2914-88d3-4180-8dce-32db38905e07", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2.512, 2.487, 2.462, 2.438, 2.413, 2.389, 2.366, 2.342, 2.319, 2.296]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# training_features_wl: they are Wavelengths variables \n", "\n", "wavelength_values = [float(item) for item in wavelength_names]\n", "wavelength_values[:10]" ] }, { "cell_type": "markdown", "id": "8bb511a7-5a5f-4b52-a6d9-e8533ece247e", "metadata": { "tags": [] }, "source": [ "### 2.1.3 Prepare Inputs and outputs for ML models (X,y)\n", "- X: 104 column variables or fluxes\n", "- y: output variables: 'gravity', 'temperature', 'c_o_ratio', 'metallicity'" ] }, { "cell_type": "code", "execution_count": 7, "id": "e90e78b1-4f64-40b6-b92e-6027c8ace483", "metadata": {}, "outputs": [], "source": [ "# Training feature variables\n", "X = train_BD.drop(\n", " columns=['gravity', \n", " 'temperature', \n", " 'c_o_ratio', \n", " 'metallicity'])\n", "\n", "\n", "# Target/Output feature variables\n", "y = train_BD[['gravity', 'c_o_ratio', 'metallicity', 'temperature', ]]\n" ] }, { "cell_type": "markdown", "id": "6932555b-26d2-4140-b5d9-f03b7ba86337", "metadata": {}, "source": [ "log-transform the 'temperature' variable toreduce the skewness of the data, making it more symmetric and normal-like for the ML model\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "07998d41-e101-410c-9a35-e5c6da1800b0", "metadata": {}, "outputs": [], "source": [ "y.loc[:, 'temperature'] = np.log10(y['temperature'])" ] }, { "cell_type": "code", "execution_count": 9, "id": "37b9edab-f6dd-406e-b4c4-aba7ef45634c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gravityc_o_ratiometallicitytemperature
05.00.25-1.03.041393
15.00.25-0.73.041393
25.00.25-0.53.041393
35.00.25-0.33.041393
45.00.250.03.041393
\n", "
" ], "text/plain": [ " gravity c_o_ratio metallicity temperature\n", "0 5.0 0.25 -1.0 3.041393\n", "1 5.0 0.25 -0.7 3.041393\n", "2 5.0 0.25 -0.5 3.041393\n", "3 5.0 0.25 -0.3 3.041393\n", "4 5.0 0.25 0.0 3.041393" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check the output variables\n", "y.head()" ] }, { "cell_type": "markdown", "id": "6108f702-bb40-49f4-9c38-306e0e27ad4f", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "c9cc267d-85a9-44c6-9081-5067d0bdd8e6", "metadata": { "tags": [] }, "source": [ "## 2.2 Processing the Data\n", "\n", "Here we instintiate BuildRegressorCNN class from DeepBuilder module to prepare the datasets and take the trained CNN (Convolutional Neural Networks) for us:\n", "\n", "- Take the synthetic spectra\n", "- Process them, e.g.\n", " - Divide them to three sets: train, validation, and test sets\n", " - Scale y variables\n", " - Scale X variables\n", " - Create new features \n", " " ] }, { "cell_type": "markdown", "id": "60932dc5-bd2b-4479-9126-da6f7b13735b", "metadata": { "tags": [] }, "source": [ "### 2.2.1 Instintiate DataProcessor class from DeepBuilder module" ] }, { "cell_type": "code", "execution_count": 10, "id": "bc38d0e4-a4c2-4288-9d33-ab15fb25093b", "metadata": {}, "outputs": [], "source": [ "data_processor = DataProcessor( \n", " flux_values=X.to_numpy(),\n", " wavelength_names=X.columns,\n", " wavelength_values=wavelength_values,\n", " output_values=y.to_numpy(),\n", " output_names=output_names,\n", " spectral_resolution=200,\n", " trained_ML_model=None,\n", " trained_ML_model_name='CNN',\n", " )" ] }, { "cell_type": "markdown", "id": "ffbeb92b-46ad-4ebb-8edc-f29d5d56e7e2", "metadata": {}, "source": [ "### 2.2.2 Split the dataset into train, validate and test sets" ] }, { "cell_type": "code", "execution_count": 11, "id": "3a00b374-192b-4a7f-9606-a9015f6db53d", "metadata": {}, "outputs": [], "source": [ "data_processor.split_train_validation_test(test_size=0.1, \n", " val_size=0.1, \n", " random_state_=42,)" ] }, { "cell_type": "markdown", "id": "bf565480-2818-48ad-859d-dd5153250007", "metadata": {}, "source": [ "### 2.2.3 Standardize X Variables Row-wise " ] }, { "cell_type": "code", "execution_count": 12, "id": "a5401e9b-1a1f-4914-838f-cac6fa26ee1b", "metadata": {}, "outputs": [], "source": [ "# Scale the X features using MinMax Scaler\n", "data_processor.standardize_X_row_wise(output_indicator='Trained_StandardScaler_X_RowWise')" ] }, { "cell_type": "code", "execution_count": 13, "id": "41659bce-458d-4a04-9f76-97c1de4c2336", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_boxplot( \n", " data = data_processor.X_train_standardized_rowwise[:, ::-1],\n", " title='Scaled main 104 Features',\n", " xlabel='Wavelength [$\\mu$m]',\n", " ylabel='Scaled Values',\n", " xticks_list=wavelength_names[::-1],\n", " fig_size=(18, 5),\n", " saved_file_name = 'Scaled_input_fluxes',\n", " __reference_data__ = __reference_data_path__,\n", " __save_plots__=True\n", " )" ] }, { "cell_type": "markdown", "id": "99348432-ec3b-445d-922f-dea2eef0a605", "metadata": {}, "source": [ "### 2.2.4 Standardize y Variables Column-wise " ] }, { "cell_type": "code", "execution_count": 14, "id": "ba7f068d-9503-4b77-b516-03d432309423", "metadata": {}, "outputs": [], "source": [ "# Standardize the y features using Standard Scaler\n", "data_processor.standardize_y_column_wise(output_indicator='Trained_StandardScaler_y_ColWise')" ] }, { "cell_type": "code", "execution_count": 15, "id": "1d90e605-c450-40cc-8cb7-57c8fe5bc6b9", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_boxplot( \n", " data = data_processor.y_train_standardized_columnwise,\n", " title='Scaled main 104 Features',\n", " xlabel='Wavelength',\n", " ylabel='Scaled Output Values',\n", " xticks_list=['','$\\log g$', 'T$_{eff}$', 'C/O ratio', '[M/H]'],\n", " fig_size=(5, 5),\n", " saved_file_name = 'Scaled_output_parameters',\n", " __reference_data__ = __reference_data_path__,\n", " __save_plots__=True\n", " )" ] }, { "cell_type": "markdown", "id": "9a4e7224-b203-448e-b531-78925f44e645", "metadata": {}, "source": [ "### 2.2.5 Feature engeenering: Take Min and Max of each row (BD spectra) " ] }, { "cell_type": "code", "execution_count": 16, "id": "b4dec506-00b2-478e-b7d5-3733f1e4efcf", "metadata": {}, "outputs": [], "source": [ "# train\n", "data_processor.X_train_min = data_processor.X_train.min(axis=1)\n", "data_processor.X_train_max = data_processor.X_train.max(axis=1)\n", "\n", "# validation\n", "data_processor.X_val_min = data_processor.X_val.min(axis=1)\n", "data_processor.X_val_max = data_processor.X_val.max(axis=1)\n", "\n", "# test\n", "data_processor.X_test_min = data_processor.X_test.min(axis=1)\n", "data_processor.X_test_max = data_processor.X_test.max(axis=1)" ] }, { "cell_type": "code", "execution_count": 17, "id": "ab93721b-c0d0-4d4d-969a-d36bed701a67", "metadata": {}, "outputs": [], "source": [ "df_MinMax_train = pd.DataFrame((data_processor.X_train_min, data_processor.X_train_max)).T\n", "df_MinMax_val = pd.DataFrame((data_processor.X_val_min, data_processor.X_val_max)).T\n", "df_MinMax_test = pd.DataFrame((data_processor.X_test_min, data_processor.X_test_max)).T" ] }, { "cell_type": "code", "execution_count": 18, "id": "915c7c1f-2e5e-46ea-bcb9-29c6b4203f8c", "metadata": {}, "outputs": [], "source": [ "df_MinMax_train.rename(columns={0:'min', 1:'max'}, inplace=True)\n", "df_MinMax_val.rename(columns={0:'min', 1:'max'}, inplace=True)\n", "df_MinMax_test.rename(columns={0:'min', 1:'max'}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 19, "id": "1aad7765-9e8e-436a-9760-a3b2210c4c7a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
minmax
08.265340e-123.445259e-08
18.080712e-228.397132e-14
22.734403e-078.632182e-06
34.414951e-163.373262e-10
43.722576e-076.859888e-06
\n", "
" ], "text/plain": [ " min max\n", "0 8.265340e-12 3.445259e-08\n", "1 8.080712e-22 8.397132e-14\n", "2 2.734403e-07 8.632182e-06\n", "3 4.414951e-16 3.373262e-10\n", "4 3.722576e-07 6.859888e-06" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_MinMax_train.head()" ] }, { "cell_type": "markdown", "id": "8fb8ee2c-2127-4ca7-b1b1-619f6b5a37bc", "metadata": {}, "source": [ "### 2.2.6 Scale Min Max features - ColumnWise" ] }, { "cell_type": "code", "execution_count": 20, "id": "1ba420c3-13b0-418b-ab72-3d0dc45e156b", "metadata": {}, "outputs": [], "source": [ "data_processor.standardize_X_column_wise(\n", " output_indicator='Trained_StandardScaler_X_ColWise_MinMax',\n", " X_train = df_MinMax_train.to_numpy(),\n", " X_val = df_MinMax_val.to_numpy(),\n", " X_test = df_MinMax_test.to_numpy(),\n", " )" ] }, { "cell_type": "code", "execution_count": 21, "id": "f00b13c5-d42b-442c-b9c2-690cace08a0d", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_boxplot( \n", " data = data_processor.X_test_standardized_columnwise,\n", " title='Scaled Min Max Inputs - ColumnWise',\n", " xlabel='Wavelength',\n", " ylabel='Scaled Output Values',\n", " xticks_list= ['','Min','Max'],\n", " fig_size=(5, 5),\n", " saved_file_name = 'Scaled_input_Min_Max_fluxes',\n", " __reference_data__ = __reference_data_path__,\n", " __save_plots__=True\n", " )" ] }, { "cell_type": "markdown", "id": "ea261364-8478-4ffd-b3c8-2e51a464f911", "metadata": {}, "source": [ "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }