{"page":"<link rel=\"stylesheet\" href=\"https://lessonplanet.com/assets/packs/css/resources-572d6a42.css\" />\n<link rel=\"stylesheet\" href=\"https://lessonplanet.com/assets/packs/css/lp_boclips_stylesheets-f4d0de30.css\" media=\"all\" />\n<div data-title='Data Engineering Overview' data-url='/boclips/videos/627db53016c64d62644df195' data-video-url='/boclips/videos/627db53016c64d62644df195' id='bo_player_modal'>\n<div class='boclips-resource-page modal-dialog panel-container'>\n<div class='react-notifications-root'></div>\n<div class='rp-header'>\n<div class='rp-type'>\n<i aria-hidden='true' class='fai fa-regular fa-circle-play'></i>\nVideo\n</div>\n<h1 class='rp-title' id='video-title'>\nData Engineering Overview\n</h1>\n<div class='rp-actions'>\n<div class='mr-1'>\n<a class=\"btn btn-success\" data-posthog-event=\"Signup: LP Signup Activity\" data-posthog-location=\"body_link_boclips\" data-remote=\"true\" href=\"/subscription/new\"><span><span>Get Free Access</span><span class=\"\"> for 10 Days</span><span>!</span></span></a>\n</div>\n</div>\n</div>\n<div class='rp-body'>\n<div class='rp-info'>\n<div aria-label='Hide resource details' class='rp-hide-info' role='button' tabindex='0'>&times;</div>\n<i aria-label='Expand resource details' class='rp-expand-info fai fa-solid fa-up-right-and-down-left-from-center' role='button' tabindex='0'></i>\n<i aria-label='Compress resource details' class='rp-compress-info fai fa-solid fa-down-left-and-up-right-to-center' role='button' tabindex='0'></i>\n<div class='rp-rating'>\n<span class='resource-pool'>\n<span class='pool-label'>Publisher:</span>\n<span class='pool-name'>\n<span class='text'><a data-publisher-id=\"30355135\" href=\"/search?publisher_ids%5B%5D=30355135\">APMonitor</a></span>\n</span>\n</span>\n</div>\n<div class='rp-description'>\n<span class='short-description'>Data engineers consolidate and prepare data for visualization, cleansing, scaling, and data division for training, validation, and testing. Data Engineering: Part 1Gathering data is the process of consolidating disparate data (Excel...</span>\n<span class='full-description hide'>Data engineers consolidate and prepare data for visualization, cleansing, scaling, and data division for training, validation, and testing. <br/><br/>Data Engineering: Part 1<br/><br/>Gathering data is the process of consolidating disparate data (Excel spreadsheet, CSV file, PDF report, database, cloud storage) into a single repository. For time series data, the tables are joined to match features and labels at particular time points.<br/><br/>Statistics provide a compact summary of the data such as number of data sets, mean, standard deviation, and quartile information. A statistical profile of the data shows how much data has been collected and the quality of that data.<br/><br/>Visualization is the graphical representation of data. Visualization is important to have a first look at the data to analyze data diversity, relationships, missing data, bad data, or other factors that may influence decisions to exclude or include an appropriate subset for training.<br/><br/>Data Cleansing is the process of removing bad data that may include outliers, missing entries, failed sensors, or other types of missing or corrupted information.<br/><br/>Data Engineering: Part 2<br/><br/>Feature Engineering is the process of selecting and creating the input descriptors for machine learning. Categorical data is converted to numeric values such as True=1 and False=0. Feature engineering creates indicators from images, words, numbers, or discrete categories. Features are ranked in order of significance. Unimportant features are identified and removed to improve training time, reduce storage cost, and minimize deployment resources.<br/><br/>Imbalanced Data is a problem for classification accuracy because the majority class is favored in the predictions. Oversampling the minority class or undersampling the majority class are two methods to restore balance.<br/><br/>Scaling data (inputs and outputs) to a range of 0 to 1 or -1 to 1 can improve the training process. There are different methods for scaling that are important based on the presence of outliers or statistical properties of the data that may suggest a larger total range so that most of the data is between 0 and 1 or -1 and 1.<br/><br/>Splitting data ensures that there are independent sets for training, testing, and validation. The test set is to evaluate the model fit independently of the training and to improve the hyper-parameters without overfitting on the training. The validation may come with a third split to evaluate the hyperparameter optimization. Cross-validation is an alternative approach to divide the training data into multiple sets that are fit separately and tested on the other set. The parameter consistency is compared between the multiple models. The inputs (features) and outputs (labels) are separated into separate data structures. A loss function such as the squared difference between the predicted label and the measured label is a typical loss (objective) function. A confusion matrix is a graphical representation of misclassification errors.<br/><br/>Data Engineering: Part 3<br/><br/>Deploying machine learning is the process of making the machine learning solution available to produce results for people or computers to access the service remotely. It involves managing data flow to the application for training or prediction. Data flow can be in batches or a continuous stream. The machine learning application may be deployed in many ways such as through an API (Application Programming Interface), a web-interface, an APP for mobile devices, or through a Jupyter Notebook or Python script. If the machine learned model is trained offline, the model is encapsulated and transferred to the hosting target. A scalable deployment means that the computing architecture can handle increased demand such as through a cloud computing host with on-demand scale-up.</span>\n</div>\n<div class='action-container flex justify-between'>\n<button aria-expanded='false' aria-label='Read more description' class='rp-full-description' type='button'>\n<i class='fai fa-solid fa-align-left'></i>\n<span id='read_more'>Read More</span>\n</button>\n<div class='rp-report'>\n</div>\n</div>\n<div aria-labelledby='resource-details-heading' class='rp-info-section'>\n<h2 class='title' id='resource-details-heading'>Resource Details</h2>\n<div class='rp-resource-details clearfix'>\n<div class='detail'>\n<dl>\n<dt>Curator Rating</dt>\n<dd><span class=\"star-rating\" aria-label=\"4.0 out of 5 stars\" role=\"img\"><i class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"></i><i class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"></i><i class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"></i><i class=\"fa-solid fa-star text-action\" aria-hidden=\"true\"></i><i class=\"fa-regular fa-star text-action\" aria-hidden=\"true\"></i></span></dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt class=\"educator-rating-title\">Educator Rating</dt><dd><div class=\"educator-rating-details\" data-path=\"/educator_ratings/rrp_data?resourceable_id=200960&amp;resourceable_type=Boclips%3A%3AVideoMetadata\"><span class=\"not-yet-rated\">Not yet Rated</span></div></dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Media Length</dt>\n<dd>5:39</dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Grade</dt><dd title=\"Grade\">10th - Higher Ed</dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Subjects</dt><dd><span><a href=\"/search?grade_ids%5B%5D=256&amp;grade_ids%5B%5D=257&amp;grade_ids%5B%5D=258&amp;grade_ids%5B%5D=259&amp;search_tab_id=1&amp;subject_ids%5B%5D=358379\">Social Studies &amp; History</a></span></dd><dd class=\"text-muted\"><i class=\"fa-solid fa-lock mr5\"></i>5 more...</dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Media Type</dt><dd><span><a href=\"/search?grade_ids%5B%5D=256&amp;grade_ids%5B%5D=257&amp;grade_ids%5B%5D=258&amp;grade_ids%5B%5D=259&amp;search_tab_id=2&amp;type_ids%5B%5D=4543647\">Instructional Videos</a></span></dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Source:</dt>\n<div class='preview-source' data-animation='true' data-boundary='.rp-info' data-container='.rp-resource-details' data-html='false' data-title='APMonitor is software for solving large-scale and complex problems with advanced simulation and optimization.' data-trigger='hover focus'>\n<span>APMonitor.com</span>\n<i aria-hidden='true' class='fa-solid fa-circle-info channel-tooltip-icon' id='channel-tooltip'></i>\n</div>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Date</dt>\n<dd>2021</dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<i aria-hidden='true' class='fai fa-solid fa-language'></i>\n<dt>Language</dt><dd>English</dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Audiences</dt><dd><span><a href=\"/search?audience_ids%5B%5D=371079&amp;grade_ids%5B%5D=256&amp;grade_ids%5B%5D=257&amp;grade_ids%5B%5D=258&amp;grade_ids%5B%5D=259&amp;search_tab_id=1\">For Teacher Use</a></span></dd><dd class=\"text-muted\"><i class=\"fa-solid fa-lock mr5\"></i>2 more...</dd>\n</dl>\n</div>\n<div class='detail'>\n<dl>\n<dt>Usage Permissions</dt><dd>Fine Print: Educational Use</dd>\n</dl>\n</div>\n</div>\n</div>\n<div aria-labelledby='additional-materials-heading' class='rp-info-section'>\n<h2 class='title' id='additional-materials-heading'>Additional Materials</h2>\n<div class='additional-material'>\n<i aria-hidden='true' class='fai fa-solid fa-lock'></i>\n<a class=\"text-muted\" title=\"Video Transcript\" data-html=\"true\" data-placement=\"bottom\" data-trigger=\"click\" data-content=\"<div class=&quot;text-center py-2&quot;><a class=&quot;bold&quot; href=&quot;/auth/users/sign_in&quot;>Sign in</a> or <a class=&quot;bold text-danger&quot; data-posthog-event=&quot;Signup: LP Signup Activity&quot; data-posthog-location=&quot;body_link_boclips&quot; data-remote=&quot;true&quot; href=&quot;/subscription/new&quot;>Join Now</a></div>\" data-title=\"Get Full Access\" data-container=\"body\" rel=\"popover\" tabindex=\"0\" href=\"/subscription/new\">Video Transcript</a>\n</div>\n<div class='additional-material'>\n<i aria-hidden='true' class='fai fa-solid fa-lock'></i>\n<a class=\"text-muted\" title=\"Video Preview\" data-html=\"true\" data-placement=\"bottom\" data-trigger=\"click\" data-content=\"<div class=&quot;text-center py-2&quot;><a class=&quot;bold&quot; href=&quot;/auth/users/sign_in&quot;>Sign in</a> or <a class=&quot;bold text-danger&quot; data-posthog-event=&quot;Signup: LP Signup Activity&quot; data-posthog-location=&quot;body_link_boclips&quot; data-remote=&quot;true&quot; href=&quot;/subscription/new&quot;>Join Now</a></div>\" data-title=\"Get Full Access\" data-container=\"body\" rel=\"popover\" tabindex=\"0\" href=\"/subscription/new\">Video Preview</a>\n</div>\n</div>\n<div aria-labelledby='concepts-heading' class='rp-info-section'>\n<h2 class='title' id='concepts-heading'>Concepts</h2>\n<div class='clearfix'>\n<div class='details-list concepts' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195' data-type='concepts'>engineering, data, scale, outliers, statistics</div>\n<div class='concepts-toggle-buttons' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195'>\n<button aria-expanded='false' class='more btn-link' type='button'>\n<span>Show More</span>\n<i aria-hidden='true' class='fa-solid fa-caret-down ml5'></i>\n</button>\n<button aria-expanded='true' class='less btn-link' style='display: none;' type='button'>\n<span>Show Less</span>\n<i aria-hidden='true' class='fa-solid fa-caret-up ml5'></i>\n</button>\n</div>\n</div>\n</div>\n<div aria-labelledby='additional-tags-heading' class='rp-info-section'>\n<h2 class='title' id='additional-tags-heading'>Additional Tags</h2>\n<div class='clearfix'>\n<div class='details-list keyterms' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195' data-type='keyterms'>apmonitor, data engineer, data engineering, training, course, short, overview, learn, graphical representation, machine learning, feature engineering, gathering data, data cleansing, imbalanced data, bad data, features, infrastructure, datasets, include, scaling, labels, missing, validation, process, parts, part, visualization, splitting, testing, talk, table</div>\n<div class='keyterms-toggle-buttons' data-identifier='Boclips::VideoDecorator627db53016c64d62644df195'>\n<button aria-expanded='false' class='more btn-link' type='button'>\n<span>Show More</span>\n<i aria-hidden='true' class='fa-solid fa-caret-down ml5'></i>\n</button>\n<button aria-expanded='true' class='less btn-link' style='display: none;' type='button'>\n<span>Show Less</span>\n<i aria-hidden='true' class='fa-solid fa-caret-up ml5'></i>\n</button>\n</div>\n</div>\n</div>\n<div aria-labelledby='classroom-considerations-heading' class='rp-info-section'>\n<h2 class='title' id='classroom-considerations-heading'>Classroom Considerations</h2>\n<div class='classroom-considerations'><div class='fai fa-solid fa-bell'></div>Best For: Explaining a topic</div><div class='classroom-considerations'><div class='fai fa-solid fa-bell'></div>Video is ad-free</div> \n</div>\n<div aria-labelledby='educator-ratings-heading' class='rp-info-section'>\n<h2 class='title sr-only' id='educator-ratings-heading'>Educator Ratings</h2>\n<div id=\"educator-ratings-root\"></div><div id=\"all-educator-ratings-root\"></div><div id=\"educator-rating-form-root\"></div>\n</div>\n</div>\n<div class='rp-resource'>\n<div aria-label='Show resource details' class='rp-show-info' role='button' tabindex='0'>\n<i class='fai fa-solid fa-align-left'></i>\nShow resource details\n</div>\n<div aria-label='Video player' class='player ie' id='player-wrapper' role='region'>\n<div class='relative container mx-auto' id='lp-boclips-visitor-thumbnail'>\n<a class=\"block\" data-html=\"true\" data-placement=\"bottom\" data-trigger=\"click\" data-content=\"<div class=&quot;text-center py-2&quot;><a class=&quot;bold&quot; href=&quot;/auth/users/sign_in&quot;>Sign in</a> or <a class=&quot;bold text-danger&quot; data-posthog-event=&quot;Signup: LP Signup Activity&quot; data-posthog-location=&quot;body_link_boclips&quot; data-remote=&quot;true&quot; href=&quot;/subscription/new&quot;>Join Now</a></div>\" data-title=\"Get Full Access\" data-container=\"body\" rel=\"popover\" tabindex=\"0\" aria-label=\"Play video: Data Engineering Overview\" href=\"/subscription/new\"><img class=\"resource-img img-thumbnail img-responsive z-10 lp-boclips-thumbnail w-full h-full lozad\" alt=\"Data Engineering Overview\" title=\"Data Engineering Overview\" onError=\"handleImageNotLoadedError(this)\" data-default-image=\"https://static.lp.lexp.cloud/images/attachment_defaults/resource/large/missing.png\" data-src=\"https://cdnapisec.kaltura.com/p/1776261/thumbnail/entry_id/1_lbhc4tci/width/250/vid_slices/3/vid_slice/1\" width=\"315\" height=\"220\" src=\"data:image/png;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs\" />\n<span aria-hidden='true' class='flex justify-center items-center bg-white rounded-full w-16 h-16 absolute top-1/2 left-1/2 -mt-8 -ml-8 cursor-pointer z-0 border-2 border-primary drop-shadow-md lp-boclips-thumbnail-playBtn'>\n<i class='fa-solid fa-play text-primary text-3xl ml-1 drop-shadow-xl'></i>\n</span>\n</a></div>\n</div>\n</div>\n</div>\n</div>\n</div>\n"}